Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
WebElements of an HTML page
#1
Hi! I'm new to Python programming and trying to create a POC for my project.

Goal:
1) Find all elements on the webpage(present inside and outside table grids)
2) Count the number of elements
3) Categorize as text fields, buttons, checkboxes, dropdowns
4) If input type = text fields AND disabled= false, create a for loop and enter sequentially 1,2,3...
5) If input type = checkbox AND disabled= false, select it
6) If input type = dropdown AND disabled= false, select first value

Currently, I'm stuck at step 1. when I'm trying to find all webelements, I'm getting a huge list with all <td> which do not have a specific attribute to filter on and categorize for step 2. Please help with a simple solution, if there exists any.


Below is the I'm trying:

import selenium.webdriver.common.devtools.v121.fed_cm
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from urllib3.util import url
from webdriver_manager.chrome import ChromeDriverManager

'''Navigate to *** template'''
driver.find_element("xpath", "/html/body/div/div[3]/ul/li[1]/a").click()
driver.find_element("xpath", "/html/body/div/div[3]/ul/li[1]/ul/li[2]/a").click()
driver.find_element("xpath", "//table//a[text()= 'AIS Import']").click()

'''Click on the first template(should be AUTO:..'''
links = driver.find_elements("xpath", "//table//a")
links[0].click()

'''Start creating ***'''
driver.find_element("xpath", "//*[@id='job_reference']").send_keys("21")

# FIND FUNCTION TO WRITE TO ALL 'TD' ELEMENTS FROM TABLE
ais_links = driver.find_elements("xpath", "//table[@id='table_no_size']//td")
count = 0

for link in ais_links:
count += 1
print(link.get_attribute("innerHTML"))
print(count)
Larz60+ write Mar-13-2024, 04:05 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply
#2
Whatever you want to do, 99.99% of the time, someone has already done it. stackoverflow.com is a good place to look!

Why not parse a very simple .html file on your computer first, to see how to do this?

I am not sure exactly what you want to do. Lets take things slowly.

As an example, the code below will get the text from tables in an html file on my local machine and save the text in a csv file.

from bs4 import BeautifulSoup
import csv
 
# get a local html file for testing
# this file has 2 tables
URL = "/var/www/html/22BE1cw/22BE1sW1.html.php"
savepath = '/home/pedro/tmp/table_text.csv' 

with open(URL, "r") as localfile:
    contents = localfile.read()
    
soup = BeautifulSoup(contents, 'lxml') 

# Function from https://stackoverflow.com/questions/2935658/beautifulsoup-get-the-contents-of-a-specific-table
def tableDataText(table):    
    def rowgetDataText(tr, coltag='td'): # td (data) or th (header)       
        return [td.get_text(strip=True) for td in tr.find_all(coltag)]  
    rows = []
    trs = table.find_all('tr')
    headerow = rowgetDataText(trs[0], 'th')
    if headerow: # if there is a header row include first
        rows.append(headerow)
        trs = trs[1:]
    for tr in trs: # for every table row
        rows.append(rowgetDataText(tr, 'td') ) # data row       
    return rows

# get the first table
#htmltable = soup.find('table', { 'class' : 'div-table' }) # put the class name to find a specific table
# find the first table
htmltable = soup.find('table')
tabletext = tableDataText(htmltable)
# save the text to a csv
with open(savepath, 'w') as f:
    writer = csv.writer(f, delimiter=',')
    writer.writerows(tabletext)

# if you want all tables
htmltables = soup.find_all('table') # returns a list so need to loop through the list
with open(savepath, 'w') as f:
    writer = csv.writer(f, delimiter=',') 
    for t in htmltables:
        tabletext = tableDataText(t) # returns a list of 2 element lists
        writer.writerows(tabletext)
What exactly is the first thing you want to do?
Nik1811 likes this post
Reply
#3
Thanks you so much @Pedroski55 for taking out the time to reply. After much readthrough across many websites, majorly stackoverflow, I'm through all the steps that I've mentioned earlier. Now my next challenge is parsing the created XML and do a comparison with an existing baseline XML.

The comparison would include checking the exact match of both attributes and values with a few exceptions.


(Mar-13-2024, 10:23 AM)Pedroski55 Wrote: Whatever you want to do, 99.99% of the time, someone has already done it. stackoverflow.com is a good place to look!

Why not parse a very simple .html file on your computer first, to see how to do this?

I am not sure exactly what you want to do. Lets take things slowly.

As an example, the code below will get the text from tables in an html file on my local machine and save the text in a csv file.

from bs4 import BeautifulSoup
import csv
 
# get a local html file for testing
# this file has 2 tables
URL = "/var/www/html/22BE1cw/22BE1sW1.html.php"
savepath = '/home/pedro/tmp/table_text.csv' 

with open(URL, "r") as localfile:
    contents = localfile.read()
    
soup = BeautifulSoup(contents, 'lxml') 

# Function from https://stackoverflow.com/questions/2935658/beautifulsoup-get-the-contents-of-a-specific-table
def tableDataText(table):    
    def rowgetDataText(tr, coltag='td'): # td (data) or th (header)       
        return [td.get_text(strip=True) for td in tr.find_all(coltag)]  
    rows = []
    trs = table.find_all('tr')
    headerow = rowgetDataText(trs[0], 'th')
    if headerow: # if there is a header row include first
        rows.append(headerow)
        trs = trs[1:]
    for tr in trs: # for every table row
        rows.append(rowgetDataText(tr, 'td') ) # data row       
    return rows

# get the first table
#htmltable = soup.find('table', { 'class' : 'div-table' }) # put the class name to find a specific table
# find the first table
htmltable = soup.find('table')
tabletext = tableDataText(htmltable)
# save the text to a csv
with open(savepath, 'w') as f:
    writer = csv.writer(f, delimiter=',')
    writer.writerows(tabletext)

# if you want all tables
htmltables = soup.find_all('table') # returns a list so need to loop through the list
with open(savepath, 'w') as f:
    writer = csv.writer(f, delimiter=',') 
    for t in htmltables:
        tabletext = tableDataText(t) # returns a list of 2 element lists
        writer.writerows(tabletext)
What exactly is the first thing you want to do?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Tkinterweb (Browser Module) Appending/Adding Additional HTML to a HTML Table Row AaronCatolico1 0 934 Dec-25-2022, 06:28 PM
Last Post: AaronCatolico1
  simple html page with update data korenron 3 2,678 Nov-15-2021, 09:31 AM
Last Post: jamesaarr
  open the html page from the django dropdown menu? shams 2 3,338 Jul-17-2021, 08:10 AM
Last Post: shams
  reading html and edit chekcbox to html jacklee26 5 3,086 Jul-01-2021, 10:31 AM
Last Post: snippsat
  HTML to Python to Windows .bat and back to HTML perfectservice33 0 1,952 Aug-22-2019, 06:31 AM
Last Post: perfectservice33

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020