Mar-13-2024, 10:23 AM
(This post was last modified: Mar-13-2024, 10:23 AM by Pedroski55.)
Whatever you want to do, 99.99% of the time, someone has already done it. stackoverflow.com is a good place to look!
Why not parse a very simple .html file on your computer first, to see how to do this?
I am not sure exactly what you want to do. Lets take things slowly.
As an example, the code below will get the text from tables in an html file on my local machine and save the text in a csv file.
Why not parse a very simple .html file on your computer first, to see how to do this?
I am not sure exactly what you want to do. Lets take things slowly.
As an example, the code below will get the text from tables in an html file on my local machine and save the text in a csv file.
from bs4 import BeautifulSoup import csv # get a local html file for testing # this file has 2 tables URL = "/var/www/html/22BE1cw/22BE1sW1.html.php" savepath = '/home/pedro/tmp/table_text.csv' with open(URL, "r") as localfile: contents = localfile.read() soup = BeautifulSoup(contents, 'lxml') # Function from https://stackoverflow.com/questions/2935658/beautifulsoup-get-the-contents-of-a-specific-table def tableDataText(table): def rowgetDataText(tr, coltag='td'): # td (data) or th (header) return [td.get_text(strip=True) for td in tr.find_all(coltag)] rows = [] trs = table.find_all('tr') headerow = rowgetDataText(trs[0], 'th') if headerow: # if there is a header row include first rows.append(headerow) trs = trs[1:] for tr in trs: # for every table row rows.append(rowgetDataText(tr, 'td') ) # data row return rows # get the first table #htmltable = soup.find('table', { 'class' : 'div-table' }) # put the class name to find a specific table # find the first table htmltable = soup.find('table') tabletext = tableDataText(htmltable) # save the text to a csv with open(savepath, 'w') as f: writer = csv.writer(f, delimiter=',') writer.writerows(tabletext) # if you want all tables htmltables = soup.find_all('table') # returns a list so need to loop through the list with open(savepath, 'w') as f: writer = csv.writer(f, delimiter=',') for t in htmltables: tabletext = tableDataText(t) # returns a list of 2 element lists writer.writerows(tabletext)What exactly is the first thing you want to do?