parsing table - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: parsing table (/thread-9737.html) Pages:
1
2
|
parsing table - ian - Apr-25-2018 A table has one tag <tbody> with 100 of <tr> in it but I only can find table and get the error below when parsing tags in the table. Is it possible some table cannot to parsed? I use python 3.6. Thanks page = requests.get(url) html = BeautifulSoup(page.content,'html.parser') table = html.find_all('table') print(len(table)) table.find('tbody')1 Traceback (most recent call last): File "C:/1 - Run/TradeOpen/Python Scripts/G&M - RealTime.py", line 12, in <module> table.find('thead') File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\site-packages\bs4\element.py", line 1807, in __getattr__ "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()? >>> RE: parsing table - nilamo - Apr-25-2018 (Apr-25-2018, 08:16 PM)ian Wrote: AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()? So even though there's only table, it's still a list of tables that just happens to have only one element. So you can either use .find() instead of find_all (like the error suggests), or you can iterate through the one-element list, like so:for each_table in table: print(each_table.find("tbody")) RE: parsing table - ian - Apr-25-2018 Thanks. there is a progress - no error anymore. But still cannot find any tag in the table table = html.find_all("table") for each_table in table: print(each_table.find("tbody"))NONE table = html.find("table") for each_table in table: print(each_table.find("tbody"))-1 RE: parsing table - nilamo - Apr-26-2018 Does the page actually have a tbody in it? Most people just skip that, and I don't think beautifulsoup adds tags where there aren't any. RE: parsing table - ian - Apr-26-2018 Yes, it does. The table has the following structures. <table class=..(lots of properties)..> <thead>...</thead> <tbody> <tr ...>...</tr> <tr ...>...</tr> ... <tr ...>...</tr> <tr ...>...</tr> <tr ...>...</tr> </tbody> </table> RE: parsing table - snippsat - Apr-26-2018 That should be okay to parse. from bs4 import BeautifulSoup html_data = '''\ <table class='Foo'> <thead>...</thead> <tbody> <tr>1</tr> <tr>2</tr> <tr>4</tr> <tr>5</tr> <tr>6</tr> </tbody> </table>''' soup = BeautifulSoup(html_data, 'lxml')Test: >>> table = soup.find('table') >>> tbody = table.find('tbody') >>> tbody <tbody> <tr>1</tr> <tr>2</tr> <tr>4</tr> <tr>5</tr> <tr>6</tr> </tbody> >>> for item in tbody.find_all('tr'): ... print(item.text) 1 2 4 5 6 >>> # CSS selector >>> soup.select('tr') [<tr>1</tr>, <tr>2</tr>, <tr>4</tr>, <tr>5</tr>, <tr>6</tr>] >>> [int(i.text) for i in soup.select('tr')] [1, 2, 4, 5, 6] RE: parsing table - ian - Apr-26-2018 Yes, the typical table ok to parse. The table in url below has one <tbody> with 100 <tr>s but cannot be found by following code import requests from bs4 import BeautifulSoup url = 'https://www.theglobeandmail.com/investing/markets/stocks/market-leaders/' page = requests.get(url) soup = BeautifulSoup(page.content,"html.parser") table = soup.find_all("table") print(table[0]) for each_table in table: print(each_table.find("tbody")) RE: parsing table - snippsat - Apr-26-2018 Can you guess why? Hint it's used all over the web because a unique position,has "good parts". RE: parsing table - nilamo - Apr-27-2018 You should take a look at the source for that url. There are no tables on that page. The data is all loaded via javascript. So, instead of scraping the page, just call the data.json url, and you won't need to scrape anything. RE: parsing table - ian - Apr-27-2018 When I use 'Inspect element' of IE11, I can see all tags in that table. |