Table data with BeatifulSoup - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Table data with BeatifulSoup (/thread-21314.html) Pages:
1
2
|
Table data with BeatifulSoup - gerry84 - Sep-24-2019 Hi All, I'm learning Python right now (and this is actually my first threat so let me know if there is a way to ask my question in a clearer manner) and I want to retrieve the rates from the table in attached link: https://www.global-rates.com/interest-rates/libor/libor.aspx however with the following code: import urllib.request as ur from bs4 import BeautifulSoup url = input('Enter URL: ') html = ur.urlopen(url).read() soup = BeautifulSoup(html, 'html.parser') data = [] table = soup.find('table', attrs={'class':'lineItemsTable'}) table_body = table.find('tbody') rows = table_body.find_all('tr') for row in rows: cols = row.find_all('td') cols = [ele.text.strip() for ele in cols] data.append([ele for ele in cols if ele])I receive this error: Can anybody help me on that?Thanks a lot! Gerald RE: Table data with BeatifulSoup - buran - Sep-24-2019 I don't see table tag with class attribute lineItemsTable .Also this site is using javascript so you need tool like selenium to render the webpage and be able to access the content I had a typo in the class and that confused me RE: Table data with BeatifulSoup - gerry84 - Sep-24-2019 Thanks Buran! I will follow the correct tagging going forward! What would be the correct table tag to retrieve the table with the rates? If I use 'tabledata1' I receive the same error: import urllib.request as ur from bs4 import BeautifulSoup url = input('Enter URL: ') html = ur.urlopen(url).read() soup = BeautifulSoup(html, 'html.parser') data = [] table = soup.find('table', attrs={'class':'tabledata1'}) table_body = table.find('tbody') rows = table_body.find_all('tr') for row in rows: cols = row.find_all('td') cols = [ele.text.strip() for ele in cols] data.append([ele for ele in cols if ele]) RE: Table data with BeatifulSoup - Larz60+ - Sep-24-2019 Try this: (will scrape page and show all table elements) you will need to install requests and lxml: pip install requests, lxml import requests from bs4 import BeautifulSoup def parsepage(page): soup = BeautifulSoup(page, 'lxml') table = soup.find('table') if table is not None: trs = table.find_all('tr') for n, tr in enumerate(trs): tds = tr.find_all('td') for n1, td in enumerate(tds): print(f"\n------------------------------ tr_{n}, td_{n1} ------------------------------") print(f"{td.prettify}") else: print(f"Cound not find table") def scrape_url(url): response = requests.get(url) if response.status_code == 200: page = response.content parsepage(page) else: print(f"unable to retreive {url}") if __name__ == '__main__': url = 'https://www.global-rates.com/interest-rates/libor/libor.aspx' scrape_url(url)partial results:
RE: Table data with BeatifulSoup - gerry84 - Sep-24-2019 Thanks Larz60+! I was able to run the code successfully only the table seems to be the wrong one. It parses through the first table with all the different languages instead of the table with the rates. How can I tweak the code to jump to the right table?
RE: Table data with BeatifulSoup - metulburr - Sep-24-2019 Something funky is with that site or i am just having a moment. If I loop the tables with Larz code modified to: def parsepage(page): soup = BeautifulSoup(page, 'lxml') tables = soup.find_all('table') for table in tables: print(table) print("------------------------------------------------------------") returnI am able to see the table with the rates. But if i go to the index of that table tables[6] , there are other tables before and after (makes no sense).
RE: Table data with BeatifulSoup - gerry84 - Sep-24-2019 Thanks everyone for your help :) I was able to modify the code from Larz60+ a bit and get the correct table: import requests from bs4 import BeautifulSoup lst=list() def parsepage(page): soup = BeautifulSoup(page, 'lxml') table = soup.find_all('table')[7] if table is not None: trs = table.find_all('tr')[12:13] for n, tr in enumerate(trs): tds = tr.find_all('td')[:2] for n1, td in enumerate(tds): # print(f"\n------------------------------ tr_{n}, td_{n1} ------------------------------") # print(f"{td.prettify}") # print(td.contents) for key in td: if key != None: lst.append(key) else: print(f"Cound not find table") print(lst) def scrape_url(url): response = requests.get(url) if response.status_code == 200: page = response.content parsepage(page) else: print(f"unable to retreive {url}") if __name__ == '__main__': url = 'https://www.global-rates.com/interest-rates/libor/libor.aspx' scrape_url(url)my output is now a list with the following: however I fail to make a nice dictionary of tuples... I would like to make it look like the following: Could you help me to create this dictionary?Thanks a lot! RE: Table data with BeatifulSoup - gerry84 - Sep-24-2019 Hi All, just wanted to let you know that I was able to write the code :) Thanks for all your help!! import requests from bs4 import BeautifulSoup d=dict() def parsepage(page): soup = BeautifulSoup(page, 'lxml') table = soup.find_all('table')[7] if table is not None: trs = table.find_all('tr')[8:15] for n, tr in enumerate(trs): tds = tr.find_all('td')[:2] for n1, td in enumerate(tds): # print(f"\n------------------------------ tr_{n}, td_{n1} ------------------------------") # print(f"{td.prettify}") # print(td.contents) td = str(td) if td.find('LIBOR')>0: spos = td.find('">') epos = td.find('</a>') title = td[spos+2:epos] d[title]=d.get(title,0) if td.find('%')>0: spos = td.find('>') epos = td.find('%') rate = td[spos+1:epos-1] d[title] = rate else: print(f"Cound not find table") def scrape_url(url): response = requests.get(url) if response.status_code == 200: page = response.content parsepage(page) else: print(f"unable to retreive {url}") if __name__ == '__main__': url = 'https://www.global-rates.com/interest-rates/libor/libor.aspx' scrape_url(url) print(d)
RE: Table data with BeatifulSoup - gerry84 - Oct-22-2019 Hi guys! I tried the code Larz60+ sent earlier on a different website but it seems like it is unable to retrieve any data. I went through the code and the website details but couldn't figure out what causes the error. Can someone point me in the right direction - not sure what is causing this... Thanks a lot for your help!! Gerald Website: https://www.barchart.com/forex/quotes/%5EEURUSD/forward-rates?orderBy=bidPrice&orderDir=desc import requests from bs4 import BeautifulSoup def parsepage(page): soup = BeautifulSoup(page, 'lxml') table = soup.find('table') if table is not None: trs = table.find_all('tr') for n, tr in enumerate(trs): tds = tr.find_all('td') for n1, td in enumerate(tds): print(f"\n------------------------------ tr_{n}, td_{n1} ------------------------------") print(f"{td.prettify}") else: print(f"Cound not find table") def scrape_url(url): response = requests.get(url) if response.status_code == 200: page = response.content parsepage(page) else: print(f"unable to retreive {url}") if __name__ == '__main__': url = 'https://www.barchart.com/forex/quotes/%5EEURUSD/forward-rates?orderBy=bidPrice&orderDir=desc' scrape_url(url)
RE: Table data with BeatifulSoup - Larz60+ - Oct-22-2019 change: table = soup.find('table')to table = soup.find_all('table')[tableno]replace tableno with the instance of desired table, 0 = first, 1 = second, etc. |