Python Forum
Table data with BeatifulSoup - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Table data with BeatifulSoup (/thread-21314.html)

Pages: 1 2


Table data with BeatifulSoup - gerry84 - Sep-24-2019

Hi All,

I'm learning Python right now (and this is actually my first threat so let me know if there is a way to ask my question in a clearer manner)
and I want to retrieve the rates from the table in attached link:
https://www.global-rates.com/interest-rates/libor/libor.aspx

however with the following code:

import urllib.request as ur
from bs4 import BeautifulSoup

url = input('Enter URL: ')

html = ur.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')

data = []
table = soup.find('table', attrs={'class':'lineItemsTable'})
table_body = table.find('tbody')

rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele])
I receive this error:

Error:
Traceback (most recent call last): File "....\global_rates.py", line 11, in table_body = table.find('tbody') AttributeError: 'NoneType' object has no attribute 'find'
Can anybody help me on that?

Thanks a lot!
Gerald


RE: Table data with BeatifulSoup - buran - Sep-24-2019

I don't see table tag with class attribute lineItemsTable.
Also this site is using javascript so you need tool like selenium to render the webpage and be able to access the content
I had a typo in the class and that confused me


RE: Table data with BeatifulSoup - gerry84 - Sep-24-2019

Thanks Buran! I will follow the correct tagging going forward!

What would be the correct table tag to retrieve the table with the rates?
If I use 'tabledata1' I receive the same error:

import urllib.request as ur
from bs4 import BeautifulSoup

url = input('Enter URL: ')

html = ur.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')

data = []
table = soup.find('table', attrs={'class':'tabledata1'})
table_body = table.find('tbody')

rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])



RE: Table data with BeatifulSoup - Larz60+ - Sep-24-2019

Try this:
(will scrape page and show all table elements)
you will need to install requests and lxml:
pip install requests, lxml
import requests
from bs4 import BeautifulSoup


def parsepage(page):
    soup = BeautifulSoup(page, 'lxml')
    table = soup.find('table')
    if table is not None:
        trs = table.find_all('tr')
        for n, tr in enumerate(trs):
            tds = tr.find_all('td')
            for n1, td in enumerate(tds):
                print(f"\n------------------------------ tr_{n}, td_{n1} ------------------------------")
                print(f"{td.prettify}")
    else:
        print(f"Cound not find table")

def scrape_url(url):
    response = requests.get(url)
    if response.status_code == 200:
        page = response.content
        parsepage(page)
    else:
        print(f"unable to retreive {url}")

if __name__ == '__main__':
    url = 'https://www.global-rates.com/interest-rates/libor/libor.aspx'
    scrape_url(url)
partial results:
Output:
------------------------------ tr_0, td_0 ------------------------------ <bound method Tag.prettify of <td> <table cellpadding="0" cellspacing="0" style="width:100%;margin:10px 0px 0px 0px;"> <tr> <td> <img alt="" src="//www.global-rates.com/images/misc/ittybittyclear.gif" style="margin:3px 4px 3px 0px;"/> </td> <td align="right" valign="bottom"> <a href="//www.global-rates.com/"><img alt="English - worldwide actual interest rates and economic indicators" border="0" src="//www.global-rates.com/images/misc/gb.gif"/></a> <a href="//nl.global-rates.com/"><img alt="Nederlands - actuele, internationale rentetarieven en economische indicatoren" border="0" src="//www.global-rates.com/images/misc/nl.gif"/></a> <a href="//de.global-rates.com/"><img alt="Deutsch - aktuelle, internationale Zinssätze und Wirtschaftindikatoren" border="0" src="//www.global-rates.com/images/misc/de.gif"/></a> <a href="//es.global-rates.com/"><img alt="Español - Español - tipos de interés e indicadores económicos actuales e internacionales" border="0" src="//www.global-rates.com/images/misc/es.gif"/></a> <a href="//it.global-rates.com/"><img alt="Italiano - tassi dâinteresse internazionali e sugli indicatori economici" border="0" src="//www.global-rates.com/images/misc/it.gif"/></a> <a href="//fr.global-rates.com/"><img alt="Français - taux dâintérêts et indicateurs économiques actuelles et internationaux" border="0" src="//www.global-rates.com/images/misc/fr.gif"/></a> <a href="//pt.global-rates.com/"><img alt="Português - taxas de juros actuais e internacionais e indicadores económicos" border="0" src="//www.global-rates.com/images/misc/pt.gif"/></a> </td> </tr> </table> </td>> ------------------------------ tr_0, td_1 ------------------------------ <bound method Tag.prettify of <td> <img alt="" src="//www.global-rates.com/images/misc/ittybittyclear.gif" style="margin:3px 4px 3px 0px;"/> </td>>



RE: Table data with BeatifulSoup - gerry84 - Sep-24-2019

Thanks Larz60+! I was able to run the code successfully only the table seems to be the wrong one.
It parses through the first table with all the different languages instead of the table with the rates.
How can I tweak the code to jump to the right table?

Output:
------------------------------ tr_0, td_0 ------------------------------ <bound method Tag.prettify of <td> <table cellpadding="0" cellspacing="0" style="width:100%;margin:10px 0px 0px 0px;"> <tr> <td> <img alt="" src="//www.global-rates.com/images/misc/ittybittyclear.gif" style="margin:3px 4px 3px 0px;"/> </td> <td align="right" valign="bottom"> <a href="//www.global-rates.com/"><img alt="English - worldwide actual interest rates and economic indicators" border="0" src="//www.global-rates.com/images/misc/gb.gif"/></a>   <a href="//nl.global-rates.com/"><img alt="Nederlands - actuele, internationale rentetarieven en economische indicatoren" border="0" src="//www.global-rates.com/images/misc/nl.gif"/></a>



RE: Table data with BeatifulSoup - metulburr - Sep-24-2019

Something funky is with that site or i am just having a moment. If I loop the tables with Larz code modified to:
def parsepage(page):
    soup = BeautifulSoup(page, 'lxml')
    tables = soup.find_all('table')
    for table in tables:
        print(table)
        print("------------------------------------------------------------")
    return
I am able to see the table with the rates. But if i go to the index of that table tables[6], there are other tables before and after (makes no sense).


RE: Table data with BeatifulSoup - gerry84 - Sep-24-2019

Thanks everyone for your help :)
I was able to modify the code from Larz60+ a bit and get the correct table:
import requests
from bs4 import BeautifulSoup

lst=list()

def parsepage(page):
    soup = BeautifulSoup(page, 'lxml')
    table = soup.find_all('table')[7]
    if table is not None:
        trs = table.find_all('tr')[12:13]
        for n, tr in enumerate(trs):
            tds = tr.find_all('td')[:2]
            for n1, td in enumerate(tds):
    #            print(f"\n------------------------------ tr_{n}, td_{n1} ------------------------------")
    #            print(f"{td.prettify}")
#                print(td.contents)
                for key in td:
                    if key != None:
                        lst.append(key)

    else:
        print(f"Cound not find table")
    print(lst)




def scrape_url(url):
    response = requests.get(url)
    if response.status_code == 200:
        page = response.content
        parsepage(page)
    else:
        print(f"unable to retreive {url}")

if __name__ == '__main__':
    url = 'https://www.global-rates.com/interest-rates/libor/libor.aspx'
    scrape_url(url)
my output is now a list with the following:
Output:
['\xa0', <a class="tabledatalink" href="/interest-rates/libor/european-euro/eur-libor-interest-rate-1-month.aspx" title="1 month European euro (EUR) LIBOR interest rate">Euro LIBOR - 1 month</a>, '-0.50200\xa0%']
however I fail to make a nice dictionary of tuples... I would like to make it look like the following:
Output:
{'Euro LIBOR - 1 month, -0.50200%', ...}
Could you help me to create this dictionary?
Thanks a lot!


RE: Table data with BeatifulSoup - gerry84 - Sep-24-2019

Hi All,

just wanted to let you know that I was able to write the code :)
Thanks for all your help!!

import requests
from bs4 import BeautifulSoup

d=dict()

def parsepage(page):
    soup = BeautifulSoup(page, 'lxml')
    table = soup.find_all('table')[7]
    if table is not None:
        trs = table.find_all('tr')[8:15]
        for n, tr in enumerate(trs):
            tds = tr.find_all('td')[:2]
            for n1, td in enumerate(tds):
    #            print(f"\n------------------------------ tr_{n}, td_{n1} ------------------------------")
    #            print(f"{td.prettify}")
#                print(td.contents)
                td = str(td)
                if td.find('LIBOR')>0:
                    spos = td.find('">')
                    epos = td.find('</a>')
                    title = td[spos+2:epos]
                    d[title]=d.get(title,0)

                if td.find('%')>0:
                    spos = td.find('>')
                    epos = td.find('%')
                    rate = td[spos+1:epos-1]
                    d[title] = rate



    else:
        print(f"Cound not find table")




def scrape_url(url):
    response = requests.get(url)
    if response.status_code == 200:
        page = response.content
        parsepage(page)
    else:
        print(f"unable to retreive {url}")

if __name__ == '__main__':
    url = 'https://www.global-rates.com/interest-rates/libor/libor.aspx'
    scrape_url(url)

print(d)
Output:
{'Euro LIBOR - overnight': '-0.56971', 'Euro LIBOR - 1 week': '-0.54743', 'Euro LIBOR - 2 weeks': 0, 'Euro LIBOR - 1 month': '-0.50200', 'Euro LIBOR - 2 months': '-0.44600', 'Euro LIBOR - 3 months': '-0.42529'}



RE: Table data with BeatifulSoup - gerry84 - Oct-22-2019

Hi guys!

I tried the code Larz60+ sent earlier on a different website but it seems like it is unable to retrieve any data. I went through the code and the website details but couldn't figure out what causes the error. Can someone point me in the right direction - not sure what is causing this...

Thanks a lot for your help!!
Gerald

Website: https://www.barchart.com/forex/quotes/%5EEURUSD/forward-rates?orderBy=bidPrice&orderDir=desc

import requests
from bs4 import BeautifulSoup


def parsepage(page):
    soup = BeautifulSoup(page, 'lxml')
    table = soup.find('table')
    if table is not None:
        trs = table.find_all('tr')
        for n, tr in enumerate(trs):
            tds = tr.find_all('td')
            for n1, td in enumerate(tds):
                print(f"\n------------------------------ tr_{n}, td_{n1} ------------------------------")
                print(f"{td.prettify}")
    else:
        print(f"Cound not find table")

def scrape_url(url):
    response = requests.get(url)
    if response.status_code == 200:
        page = response.content
        parsepage(page)
    else:
        print(f"unable to retreive {url}")

if __name__ == '__main__':
    url = 'https://www.barchart.com/forex/quotes/%5EEURUSD/forward-rates?orderBy=bidPrice&orderDir=desc'
    scrape_url(url)
Output:
unable to retreive https://www.barchart.com/forex/quotes/%5EEURUSD/forward-rates?orderBy=bidPrice&orderDir=desc



RE: Table data with BeatifulSoup - Larz60+ - Oct-22-2019

change:
table = soup.find('table')
to
table = soup.find_all('table')[tableno]
replace tableno with the instance of desired table, 0 = first, 1 = second, etc.