Feb-15-2018, 01:04 AM
For some reason when I try to scrape links from any RSS feed it saves them with improper syntax.
Example, instead of:
"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
Here's my full code:
Example, instead of:
['<link>http://url.com/1/</link>', '<link>http://url.com/2/</link>', '<link>http://url.com/3/</link>']It gives me results like this:
[<link>http://url.com/1/</link>, <link>http://url.com/2/</link>, <link>http://url.com/3/</link>]When I try to pull the innertext and just get a clean link list with no tags I get errors:
"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
Here's my full code:
page = requests.get(http://www.cbc.ca/cmlink/rss-topstories, headers= {"user-agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) " "AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/60.0.3112.90 Safari/537.36"}) soup = BeautifulSoup(page.content, features="xml") link_list = soup.find_all('link') link_list = link_list.textAny ideas why the list it scrapes is broken?