Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 parsing table
#1
A table has one tag <tbody> with 100 of <tr> in it but I only can find table and get the error below when parsing tags in the table. Is it possible some table cannot to parsed? I use python 3.6. Thanks

page = requests.get(url)
html = BeautifulSoup(page.content,'html.parser')
table = html.find_all('table')
print(len(table))
table.find('tbody')
1
Traceback (most recent call last):
File "C:/1 - Run/TradeOpen/Python Scripts/G&M - RealTime.py", line 12, in <module>
table.find('thead')
File "C:\Users\Ian\AppData\Local\Programs\Python\Python36\lib\site-packages\bs4\element.py", line 1807, in __getattr__
"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
>>>
Quote
#2
(Apr-25-2018, 08:16 PM)ian Wrote: AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

So even though there's only table, it's still a list of tables that just happens to have only one element. So you can either use .find() instead of find_all (like the error suggests), or you can iterate through the one-element list, like so:
for each_table in table:
    print(each_table.find("tbody"))
Quote
#3
Thanks. there is a progress - no error anymore. But still cannot find any tag in the table

table = html.find_all("table")
for each_table in table: print(each_table.find("tbody"))
NONE

table = html.find("table")
for each_table in table: print(each_table.find("tbody"))
-1
Quote
#4
Does the page actually have a tbody in it? Most people just skip that, and I don't think beautifulsoup adds tags where there aren't any.
Quote
#5
Yes, it does. The table has the following structures.

<table class=..(lots of properties)..>
<thead>...</thead>
<tbody>
<tr ...>...</tr>
<tr ...>...</tr>
...
<tr ...>...</tr>
<tr ...>...</tr>
<tr ...>...</tr>
</tbody>
</table>
Quote
#6
That should be okay to parse.
from bs4 import BeautifulSoup

html_data = '''\
<table class='Foo'>
  <thead>...</thead>
  <tbody>
    <tr>1</tr>
    <tr>2</tr>
    <tr>4</tr>
    <tr>5</tr>
    <tr>6</tr>
  </tbody>
</table>'''

soup = BeautifulSoup(html_data, 'lxml')
Test:
>>> table = soup.find('table')
>>> tbody = table.find('tbody')
>>> tbody
<tbody>
<tr>1</tr>
<tr>2</tr>
<tr>4</tr>
<tr>5</tr>
<tr>6</tr>
</tbody>

>>> for item in tbody.find_all('tr'):
...     print(item.text)  
1
2
4
5
6

>>> # CSS selector
>>> soup.select('tr')
[<tr>1</tr>, <tr>2</tr>, <tr>4</tr>, <tr>5</tr>, <tr>6</tr>]
>>> [int(i.text) for i in soup.select('tr')]
[1, 2, 4, 5, 6]
Quote
#7
Yes, the typical table ok to parse. The table in url below has one <tbody> with 100 <tr>s but cannot be found by following code
import requests
from bs4 import BeautifulSoup
url = 'https://www.theglobeandmail.com/investing/markets/stocks/market-leaders/'
page = requests.get(url)
soup = BeautifulSoup(page.content,"html.parser")
table = soup.find_all("table")
print(table[0])
for each_table in table: print(each_table.find("tbody"))
Quote
#8
Can you guess why?
Hint it's used all over the web because a unique position,has "good parts".
nilamo likes this post
Quote
#9
You should take a look at the source for that url. There are no tables on that page.

The data is all loaded via javascript. So, instead of scraping the page, just call the data.json url, and you won't need to scrape anything.
Quote
#10
When I use 'Inspect element' of IE11, I can see all tags in that table.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  table from wikipedia flow50 0 28 1 hour ago
Last Post: flow50
  Parsing infor from scraped files. Larz60+ 2 225 Apr-12-2019, 05:06 PM
Last Post: Larz60+
  Fetching and Parsing XML Data FalseFact 3 271 Apr-01-2019, 10:21 AM
Last Post: Larz60+
  Selenium Parsing (unable to Parse page after loading) oneclick 6 505 Jan-13-2019, 03:10 AM
Last Post: oneclick
  sqlalchemy DataTables::"No data available in table" when using self-joined table Asma 0 371 Nov-22-2018, 02:46 PM
Last Post: Asma
  XML parsing from URL mightyn00b 5 1,943 Nov-22-2018, 02:59 AM
Last Post: Larz60+
  XML Parsing - Find a specific text (ElementTree) TeraX 3 511 Oct-09-2018, 09:06 AM
Last Post: TeraX
  XML parsing and generating HTML page Python 3.6 Madhuri 2 490 Aug-24-2018, 02:48 PM
Last Post: snippsat
  Problem parsing website html file thefpgarace 2 676 May-01-2018, 11:09 AM
Last Post: Standard_user
  beautiful soup - parsing scraped code in a script lilbigwill99 2 580 Mar-09-2018, 04:10 PM
Last Post: lilbigwill99

Forum Jump:


Users browsing this thread: 1 Guest(s)