Python Forum

Full Version: Web scraping for search results
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi , I was trying to extract search results but has the error "AttributeError: 'list' object has no attribute 'text'[/output]"

Please suggest.

# import libraries
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import pandas as pd
# specify the url
urlpage = 'https://RP&recordedDateRange=18000101%2C20220506&searchOcrText=false&searchType=quickSearch' 
print(urlpage)
# run chrome webdriver from executable path of your choice
driver=webdriver.Chrome(r'C:\Users\Desktop\webdrivers\chromedriver_win32\chromedriver.exe')
# get web page
driver.get(urlpage)
# execute script to scroll down the page
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
# sleep for 30s
time.sleep(30)
# driver.quit()
# find elements by xpath

results = driver.find_elements_by_xpath("/html/body/div[2]/main/div[2]/div/div/article/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[3]").text

print('Number of results', len(results))

# create empty array to store data
data = []
# loop over results
for result in results:
    product_list = result.text
    
    # close driver 
driver.quit()
# save to pandas dataframe
df = pd.DataFrame(results)
print(df)
# write to csv
df.to_csv(r'C:\Users\\Desktop\list.csv')

Error

Output:
runfile('C:/Users/.spyder-py3/untitled1.py', wdir='C:/Users/.spyder-py3') https:/results?department=RP&recordedDateRange=18000101%2C20220506&searchOcrText=false&searchType=quickSearch C:\Users\.spyder-py3\untitled1.py:11: DeprecationWarning: executable_path has been deprecated, please pass in a Service object driver=webdriver.Chrome(r'C:\Users\Desktop\webdrivers\chromedriver_win32\chromedriver.exe') C:\Users\.spyder-py3\untitled1.py:23: DeprecationWarning: find_elements_by_xpath is deprecated. Please use find_elements(by=By.XPATH, value=xpath) instead results = driver.find_elements_by_xpath("/html/body/div[2]/main/div[2]/div/div/article/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[3]").text Traceback (most recent call last): File "C:\Users\.spyder-py3\untitled1.py", line 23, in <module> results = driver.find_elements_by_xpath("/html/body/div[2]/main/div[2]/div/div/article/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[3]").text AttributeError: 'list' object has no attribute 'text'
So, find_elements_by_xpath returns a list (the name suggests that, as "elements" is plural). You should probably just drop the .text there I guess? You're later iterating on the list and asking for .text on each item.
(May-14-2022, 06:41 AM)ndc85430 Wrote: [ -> ]So, find_elements_by_xpath returns a list (the name suggests that, as "elements" is plural). You should probably just drop the .text there I guess? You're later iterating on the list and asking for .text on each item.




Output:
runfile('C:/Users/.spyder-py3/untitled1.py', wdir='C:/Users/.spyder-py3') https://results?department=RP&recordedDateRange=18000101%2C20220506&searchOcrText=false&searchType=quickSearch C:\Users\.spyder-py3\untitled1.py:11: DeprecationWarning: executable_path has been deprecated, please pass in a Service object driver=webdriver.Chrome(r'C:\Users\Desktop\webdrivers\chromedriver_win32\chromedriver.exe') C:\Users\.spyder-py3\untitled1.py:23: DeprecationWarning: find_elements_by_xpath is deprecated. Please use find_elements(by=By.XPATH, value=xpath) instead results = driver.find_elements_by_xpath("/html/body/div[2]/main/div[2]/div/div/article/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[3]") Number of results 1 0 0 <selenium.webdriver.remote.webelement.WebEleme...
(May-14-2022, 07:20 AM)JOE Wrote: [ -> ]
(May-14-2022, 06:41 AM)ndc85430 Wrote: [ -> ]So, find_elements_by_xpath returns a list (the name suggests that, as "elements" is plural). You should probably just drop the .text there I guess? You're later iterating on the list and asking for .text on each item.




Output:
runfile('C:/Users/.spyder-py3/untitled1.py', wdir='C:/Users/.spyder-py3') https://results?department=RP&recordedDateRange=18000101%2C20220506&searchOcrText=false&searchType=quickSearch C:\Users\.spyder-py3\untitled1.py:11: DeprecationWarning: executable_path has been deprecated, please pass in a Service object driver=webdriver.Chrome(r'C:\Users\Desktop\webdrivers\chromedriver_win32\chromedriver.exe') C:\Users\.spyder-py3\untitled1.py:23: DeprecationWarning: find_elements_by_xpath is deprecated. Please use find_elements(by=By.XPATH, value=xpath) instead results = driver.find_elements_by_xpath("/html/body/div[2]/main/div[2]/div/div/article/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[3]") Number of results 1 0 0 <selenium.webdriver.remote.webelement.WebEleme...

I have also tried to change the deprecation warning:
results = driver.find_elements(by=By.XPATH, value = "/html/body/div[2]/main/div[2]/div/div/article/div[3]/div/div[2]/div[1]/table/tbody/tr[1]/td[3]")
however it has error " NameError: name 'By' is not defined "
(May-14-2022, 07:33 AM)JOE Wrote: [ -> ]however it has error " NameError: name 'By' is not defined "
Need:
from selenium.webdriver.common.by import By
Look at this Thread this show updates for new Selenium V4.
(May-14-2022, 08:01 AM)snippsat Wrote: [ -> ]
(May-14-2022, 07:33 AM)JOE Wrote: [ -> ]however it has error " NameError: name 'By' is not defined "
Need:
from selenium.webdriver.common.by import By
Look at this Thread this show updates for new Selenium V4.


Output:
runfile('C:/Users/.spyder-py3/untitled1.py', wdir='C:/Users/.spyder-py3') https://results?department=RP&recordedDateRange=18000101%2C20220506&searchOcrText=false&searchType=quickSearch C:\Users\.spyder-py3\untitled1.py:12: DeprecationWarning: executable_path has been deprecated, please pass in a Service object driver=webdriver.Chrome(r'C:\Users\Desktop\webdrivers\chromedriver_win32\chromedriver.exe') [b]Number of results 1[/b] 0 0 <selenium.webdriver.remote.webelement.WebEleme...
(May-14-2022, 08:34 AM)JOE Wrote: [ -> ]
(May-14-2022, 08:01 AM)snippsat Wrote: [ -> ]Need:
from selenium.webdriver.common.by import By
Look at this Thread this show updates for new Selenium V4.


Output:
runfile('C:/Users/.spyder-py3/untitled1.py', wdir='C:/Users/.spyder-py3') https://results?department=RP&recordedDateRange=18000101%2C20220506&searchOcrText=false&searchType=quickSearch C:\Users\.spyder-py3\untitled1.py:12: DeprecationWarning: executable_path has been deprecated, please pass in a Service object driver=webdriver.Chrome(r'C:\Users\Desktop\webdrivers\chromedriver_win32\chromedriver.exe') [b]Number of results 1[/b] 0 0 <selenium.webdriver.remote.webelement.WebEleme...

Output:
CSV Output <selenium.webdriver.remote.webelement.WebElement (session="946b99349430927eab70c642de218269", element="679bb77f-10ef-4b72-b724-49e1904a527b")>
HI, I have changed the code to certain aspect and successfully getting the required output.
Need help to understand what went wrong further. I have to get data further under col-7, col-8, col-9, col-10.
But when i add those code line, it gives error "unindent does not match any outer indentation level"

The working code is given below

import xlsxwriter
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

element_list = []

for page in range(1, 3, 1):
	
	page_url = "/results?_=false&searchType=quickSearch" + str(page)
	driver = webdriver.Chrome(ChromeDriverManager().install())
	driver.get(page_url)
	title = driver.find_elements_by_class_name("col-3")
	name = driver.find_elements_by_class_name("col-4")
	type = driver.find_elements_by_class_name("col-5")
	date = driver.find_elements_by_class_name("col-6")
    

	for i in range(len(title)):
		element_list.append([title[i].text, name[i].text, type[i].text,  date[i].text])

with xlsxwriter.Workbook(r'C:\Users\Desktop\list.xlx') as workbook:
	worksheet = workbook.add_worksheet()

	for row_num, data in enumerate(element_list):
		worksheet.write_row(row_num, 0, data)

driver.close()