Python Forum
Thread Rating:
  • 1 Vote(s) - 2 Average
  • 1
  • 2
  • 3
  • 4
  • 5
web scraping with selenium and bs4
#1
hi,
I am trying to scrape a website that has text and links
i am creating a web scraper that will scrape the data using beautilfulsoup and requests and links using selenium.

everything is working fine in requests part but not in selenium part.

In the selenium part it is required to click on the link and link will open and then get page url and then move to main page and then start same procedures for another links but when i run the code it gets only first links and then throw error

Error:
Traceback (most recent call last): File "C:\Users\prince.bhatia\Desktop\Bihar_rera\Bihar_Rera.py", line 90, in <module> main() File "C:\Users\prince.bhatia\Desktop\Bihar_rera\Bihar_Rera.py", line 89, in main parsedata() File "C:\Users\prince.bhatia\Desktop\Bihar_rera\Bihar_Rera.py", line 80, in parsedata geta = i.find_elements_by_tag_name("a")[1] File "C:\Users\prince.bhatia\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webelement.py", line 237, in find_elements_by_tag_name return self.find_elements(by=By.TAG_NAME, value=name) File "C:\Users\prince.bhatia\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webelement.py", line 527, in find_elements {"using": by, "value": value})['value'] File "C:\Users\prince.bhatia\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webelement.py", line 493, in _execute return self._parent.execute(command, params) File "C:\Users\prince.bhatia\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 256, in execute self.error_handler.check_response(response) File "C:\Users\prince.bhatia\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document (Session info: chrome=67.0.3396.99) (Driver info: chromedriver=2.39.562718 (9a2698cba08cf5a471a29d30c8b3e12becabb0e9),platform=Windows NT 6.1.7601 SP1 x86_64)


below are my codes:

from bs4 import BeautifulSoup
import requests
import csv
from selenium import webdriver
from selenium.webdriver.common import keys
from selenium.webdriver.support.ui import Select
import time
import functools
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions

url = "https://nagarseva.bihar.gov.in/rerabihar/ReraGetProjectStatus.aspx"
final_data = []

def writefiles(alldata, filename):
    with open ("./"+ filename, "w") as csvfile:
        csvfile = csv.writer(csvfile, delimiter=",")
        csvfile.writerow("")
        for i in range(0, len(alldata)):
            csvfile.writerow(alldata[i])


def getbyGet(url, values):
    res = requests.get(url, data=values)
    text = res.text
    return text

def parsedata():
    payload = {}
    global url, final_data
    data = getbyGet(url, {})
    soup = BeautifulSoup(data, "html.parser")
    #EVENTTARGET = soup.select("#__EVENTTARGET")[0]['value']
    EVENTVALIDATION = soup.select("#__EVENTVALIDATION")[0]['value']
    #print(EVENTVALIDATION)
    VIEWSTATE = soup.select("#__VIEWSTATE")[0]['value']
    #print(VIEWSTATE)
    #VIEWSTATEGENERATOR = soup.select("#__VIEWSTATEGENERATOR")[0]["value"]
    headers= {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
              'Content-Type':'application/x-www-form-urlencoded',
              'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0'}

    formfields =  {"__EVENTARGUMENT":"PrintIndicator$0",
                   '__EVENTTARGET':"ctl00$ContentPlaceHolder1$GV_Building",
                   '__EVENTVALIDATION':EVENTVALIDATION,
                  #'__EVENTTARGET':EVENTTARGET,
                  '__VIEWSTATE':VIEWSTATE,
                  "__VIEWSTATEENCRYPTED":"",
                  "__VIEWSTATEGENERATOR":"CE676888",
                   }
    s = requests.session()
    res = s.post(url, data=formfields, headers=headers).text
    soup = BeautifulSoup(res, "html.parser")
    
    getdata = soup.find_all("div", {"class":"col-lg-8 col-md-8 text-left"})
    for i in getdata:
        datas = i.find_all("h4")
        for getspan in datas:
            Buildername = getspan.find_all("span")[2].text
            projectname = getspan.find_all("span")[3].text
        getp = i.find_all("p")
        for data in getp:
            address = data.find_all("span")[2].text
            area = data.find_all("span")[5].text
            district = data.find_all("span")[8].text
            stardate = data.find_all("span")[11].text
            enddate = data.find_all("span")[12].text
            status = data.find_all("span")[13].text
    
    driver = webdriver.Chrome("./chromedriver")
    driver.get('https://nagarseva.bihar.gov.in/rerabihar/ReraGetProjectStatus.aspx')
    d = driver.find_element_by_xpath('/html/body/form/div[3]/div[2]/table/tbody/tr/td/table/tbody/tr[1]/td[1]/div/table/tbody/tr[2]/td[3]/input')
    d.click()
    getclass = driver.find_elements_by_css_selector(".col-lg-3.col-md-3")
    for i in getclass:
        sublist = []
        time.sleep(2)
        geta = i.find_elements_by_tag_name("a")[1]
        geta.click()
        window_before = driver.window_handles[0]
        driver.switch_to_window(driver.window_handles[-1])
        d = driver.current_url
        print(d)
        sublist.append(d)
        driver.switch_to_window(window_before)
def main():
    parsedata()
main()
Please help on this one
Reply
#2
Hello! I am looking at the error message and I am thinking that the element you want to interact with is not part of the DOM. I don't have time to play or examine the code. Open the browser and hit CTRL+SHIFT+C to bring the dev tools up. If you don't see the DOM link/button open the settings and click on the box against it. Then inspect the DOM tree.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#3
error is coming at line 79 which is this:

geta = i.find_elements_by_tag_name("a")[1]
so this is how it should work:
main page has links , click the links, open url, grab url and then close the url and go to main page and start process again
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Scraping div tags with selenium, need help hfakoor2 1 1,068 Mar-12-2023, 08:31 AM
Last Post: hfakoor2
  Web scraping cookie in URL blocks selenium Alex06 2 2,442 Jan-10-2021, 01:43 PM
Last Post: Alex06
  Web scraping (selenium (i think)) Larz60+ 10 6,211 Jan-27-2019, 02:57 AM
Last Post: Larz60+
  Web Page not opening while web scraping through python selenium sumandas89 4 10,097 Nov-19-2018, 02:47 PM
Last Post: snippsat
  scraping javascript websites with selenium DoctorEvil 1 3,363 Jun-08-2018, 06:40 PM
Last Post: DoctorEvil
  Combining selenium and beautifulsoup for web scraping sumandas89 3 11,636 Jan-30-2018, 02:14 PM
Last Post: metulburr
  web scraping using selenium sumandas89 3 3,580 Jan-05-2018, 01:45 PM
Last Post: metulburr
  Error in Selenium: CRITICAL:root:Selenium module is not installed...Exiting program. AcszE 1 3,627 Nov-03-2017, 08:41 PM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020