Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Table data with BeatifulSoup
#11
Still getting the same error, it seems like the website is not responding at all :(
Could it be that something in the "scrape_url" part needs to be changed?
Thanks again for all the help!

def scrape_url(url):
    response = requests.get(url)
    if response.status_code == 200:
        page = response.content
        parsepage(page)
    else:
        print(f"unable to retreive {url}")
Reply
#12
I rewrote this code (original written at 1:00 AM, no doubt after being at it since 4 the previous morning)

I've added code to better prettify the html, and only displaying the 6th table, which is bizarre HTML
there are tr's nested within td's etc. it's quite bizarre.

Good luck with this one.

PrettifyPage.py (place in same directory as other script
# PrettifyPage.py

from bs4 import BeautifulSoup
import requests
import pathlib


class PrettifyPage:
    def __init__(self):
        pass

    def prettify(self, soup, indent):
        pretty_soup = str()
        previous_indent = 0
        for line in soup.prettify().split("\n"):
            current_indent = str(line).find("<")
            if current_indent == -1 or current_indent > previous_indent + 2:
                current_indent = previous_indent + 1
            previous_indent = current_indent
            pretty_soup += self.write_new_line(line, current_indent, indent)
        return pretty_soup

    def write_new_line(self, line, current_indent, desired_indent):
        new_line = ""
        spaces_to_add = (current_indent * desired_indent) - current_indent
        if spaces_to_add > 0:
            for i in range(spaces_to_add):
                new_line += " "		
        new_line += str(line) + "\n"
        return new_line

if __name__ == '__main__':
    pp = PrettifyPage()
    pfilename = pp.bpath.htmlpath / 'BusinessEntityRecordsAA.html'
    with pfilename.open('rb') as fp:
        page = fp.read()
    soup = BeautifulSoup(page, 'lxml')
    pretty = pp.prettify(soup, indent=2)
    print(pretty)
import requests
from bs4 import BeautifulSoup
import PrettifyPage
 
 
def parsepage(page):
    pp = PrettifyPage.PrettifyPage()
    if page:
        soup = BeautifulSoup(page, 'lxml')
        table = soup.find_all('table')[6]
        if table is not None:
            trs = table.find_all('tr')
            for n, tr in enumerate(trs):
                print(f"\n------------------------------ tr_{n} ------------------------------")
                print(f"{pp.prettify(tr, 2)}")
                # tds = tr.find_all('td')
                # for n1, td in enumerate(tds):
                #     print(f"\n------------------------------ tr_{n}, td_{n1} ------------------------------")
                #     print(f"{pp.prettify(td, 2)}")
    else:
        print(f"Cound not find table")
 
def get_page(url):
    response = requests.get(url)
    if response.status_code == 200:
        page = response.content
        return page
    else:
        print(f"unable to retreive {url}")

def scrape_url(url):
    parsepage(get_page(url))

if __name__ == '__main__':
    url = 'https://www.global-rates.com/interest-rates/libor/libor.aspx'
    scrape_url(url)
partial output:
Output:
------------------------------ tr_0 ------------------------------ <tr style="height:100%;" valign="top"> <td> <table cellpadding="0" cellspacing="0" style="height:100%;"> <tr> <td colspan="4"> <table cellpadding="0" cellspacing="0" style="width:100%;"> <tr> <td> <table style="margin:6px 0px 0px 0px;width:100%;"> <tr> <td align="center"> <script type="text/javascript"> <!-- google_ad_client = "ca-pub-8844689419180727"; /* GR 728x90 positie 1 */ google_ad_slot = "6980580330"; google_ad_width = 728; google_ad_height = 90; //--> </script> <script src="https://pagead2.googlesyndication.com/pagead/show_ads.js" type="text/javascript"> </script> </td> </tr> </table> </td> </tr> </table> </td> </tr>
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Scraping data from table into existing dataframe vincer58 1 1,960 Jan-09-2022, 05:15 PM
Last Post: vincer58
  Inserting data from a table to another (in same db) firebird 5 2,425 Oct-05-2020, 06:04 AM
Last Post: buran
  Extract data from a table Bob_M 3 2,627 Aug-14-2020, 03:36 PM
Last Post: Bob_M
  Scraping a dynamic data-table in python through AJAX request filozofo 1 3,823 Aug-14-2020, 10:13 AM
Last Post: kashcode
  Want to scrape a table data and export it into CSV format tahir1990 9 5,130 Oct-22-2019, 08:03 AM
Last Post: buran
  Using flask to add data to sqlite3 table with PRIMARY KEY catafest 1 3,702 Sep-09-2019, 07:00 AM
Last Post: buran
  sqlalchemy DataTables::"No data available in table" when using self-joined table Asma 0 2,548 Nov-22-2018, 02:46 PM
Last Post: Asma
  beatifulsoup scrap td tag. piuk3man 1 3,839 Jun-11-2018, 06:16 AM
Last Post: buran
  Insert data in a table after a user is created from djando admin prithvi 0 3,512 Aug-11-2017, 06:25 PM
Last Post: prithvi
  Installation of bs4 and BeatifulSoup landlord1984 7 8,654 Jan-09-2017, 07:41 AM
Last Post: landlord1984

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020