WebElements of an HTML page

Nik1811 · Mar-14-2024, 12:39 PM

Thanks you so much @Pedroski55 for taking out the time to reply. After much readthrough across many websites, majorly stackoverflow, I'm through all the steps that I've mentioned earlier. Now my next challenge is parsing the created XML and do a comparison with an existing baseline XML.

The comparison would include checking the exact match of both attributes and values with a few exceptions.

(Mar-13-2024, 10:23 AM)Pedroski55 Wrote: Whatever you want to do, 99.99% of the time, someone has already done it. stackoverflow.com is a good place to look!

Why not parse a very simple .html file on your computer first, to see how to do this?

I am not sure exactly what you want to do. Lets take things slowly.

As an example, the code below will get the text from tables in an html file on my local machine and save the text in a csv file.

from bs4 import BeautifulSoup
import csv
 
# get a local html file for testing
# this file has 2 tables
URL = "/var/www/html/22BE1cw/22BE1sW1.html.php"
savepath = '/home/pedro/tmp/table_text.csv' 

with open(URL, "r") as localfile:
    contents = localfile.read()
    
soup = BeautifulSoup(contents, 'lxml') 

# Function from https://stackoverflow.com/questions/2935658/beautifulsoup-get-the-contents-of-a-specific-table
def tableDataText(table):    
    def rowgetDataText(tr, coltag='td'): # td (data) or th (header)       
        return [td.get_text(strip=True) for td in tr.find_all(coltag)]  
    rows = []
    trs = table.find_all('tr')
    headerow = rowgetDataText(trs[0], 'th')
    if headerow: # if there is a header row include first
        rows.append(headerow)
        trs = trs[1:]
    for tr in trs: # for every table row
        rows.append(rowgetDataText(tr, 'td') ) # data row       
    return rows

# get the first table
#htmltable = soup.find('table', { 'class' : 'div-table' }) # put the class name to find a specific table
# find the first table
htmltable = soup.find('table')
tabletext = tableDataText(htmltable)
# save the text to a csv
with open(savepath, 'w') as f:
    writer = csv.writer(f, delimiter=',')
    writer.writerows(tabletext)

# if you want all tables
htmltables = soup.find_all('table') # returns a list so need to loop through the list
with open(savepath, 'w') as f:
    writer = csv.writer(f, delimiter=',') 
    for t in htmltables:
        tabletext = tableDataText(t) # returns a list of 2 element lists
        writer.writerows(tabletext)

What exactly is the first thing you want to do?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row	AaronCatolico1	0	959	Dec-25-2022, 06:28 PM Last Post: AaronCatolico1
	simple html page with update data	korenron	3	2,717	Nov-15-2021, 09:31 AM Last Post: jamesaarr
	open the html page from the django dropdown menu?	shams	2	3,353	Jul-17-2021, 08:10 AM Last Post: shams
	reading html and edit chekcbox to html	jacklee26	5	3,122	Jul-01-2021, 10:31 AM Last Post: snippsat
	HTML to Python to Windows .bat and back to HTML	perfectservice33	0	1,968	Aug-22-2019, 06:31 AM Last Post: perfectservice33

WebElements of an HTML page

User Panel Messages

Announcements