web scraping with selenium and bs4 - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: web scraping with selenium and bs4 (/thread-12901.html) |
web scraping with selenium and bs4 - Prince_Bhatia - Sep-18-2018 hi, I am trying to scrape a website that has text and links i am creating a web scraper that will scrape the data using beautilfulsoup and requests and links using selenium. everything is working fine in requests part but not in selenium part. In the selenium part it is required to click on the link and link will open and then get page url and then move to main page and then start same procedures for another links but when i run the code it gets only first links and then throw error below are my codes: from bs4 import BeautifulSoup import requests import csv from selenium import webdriver from selenium.webdriver.common import keys from selenium.webdriver.support.ui import Select import time import functools from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions url = "https://nagarseva.bihar.gov.in/rerabihar/ReraGetProjectStatus.aspx" final_data = [] def writefiles(alldata, filename): with open ("./"+ filename, "w") as csvfile: csvfile = csv.writer(csvfile, delimiter=",") csvfile.writerow("") for i in range(0, len(alldata)): csvfile.writerow(alldata[i]) def getbyGet(url, values): res = requests.get(url, data=values) text = res.text return text def parsedata(): payload = {} global url, final_data data = getbyGet(url, {}) soup = BeautifulSoup(data, "html.parser") #EVENTTARGET = soup.select("#__EVENTTARGET")[0]['value'] EVENTVALIDATION = soup.select("#__EVENTVALIDATION")[0]['value'] #print(EVENTVALIDATION) VIEWSTATE = soup.select("#__VIEWSTATE")[0]['value'] #print(VIEWSTATE) #VIEWSTATEGENERATOR = soup.select("#__VIEWSTATEGENERATOR")[0]["value"] headers= {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Content-Type':'application/x-www-form-urlencoded', 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0'} formfields = {"__EVENTARGUMENT":"PrintIndicator$0", '__EVENTTARGET':"ctl00$ContentPlaceHolder1$GV_Building", '__EVENTVALIDATION':EVENTVALIDATION, #'__EVENTTARGET':EVENTTARGET, '__VIEWSTATE':VIEWSTATE, "__VIEWSTATEENCRYPTED":"", "__VIEWSTATEGENERATOR":"CE676888", } s = requests.session() res = s.post(url, data=formfields, headers=headers).text soup = BeautifulSoup(res, "html.parser") getdata = soup.find_all("div", {"class":"col-lg-8 col-md-8 text-left"}) for i in getdata: datas = i.find_all("h4") for getspan in datas: Buildername = getspan.find_all("span")[2].text projectname = getspan.find_all("span")[3].text getp = i.find_all("p") for data in getp: address = data.find_all("span")[2].text area = data.find_all("span")[5].text district = data.find_all("span")[8].text stardate = data.find_all("span")[11].text enddate = data.find_all("span")[12].text status = data.find_all("span")[13].text driver = webdriver.Chrome("./chromedriver") driver.get('https://nagarseva.bihar.gov.in/rerabihar/ReraGetProjectStatus.aspx') d = driver.find_element_by_xpath('/html/body/form/div[3]/div[2]/table/tbody/tr/td/table/tbody/tr[1]/td[1]/div/table/tbody/tr[2]/td[3]/input') d.click() getclass = driver.find_elements_by_css_selector(".col-lg-3.col-md-3") for i in getclass: sublist = [] time.sleep(2) geta = i.find_elements_by_tag_name("a")[1] geta.click() window_before = driver.window_handles[0] driver.switch_to_window(driver.window_handles[-1]) d = driver.current_url print(d) sublist.append(d) driver.switch_to_window(window_before) def main(): parsedata() main()Please help on this one RE: web scraping with selenium and bs4 - wavic - Sep-18-2018 Hello! I am looking at the error message and I am thinking that the element you want to interact with is not part of the DOM. I don't have time to play or examine the code. Open the browser and hit CTRL+SHIFT+C to bring the dev tools up. If you don't see the DOM link/button open the settings and click on the box against it. Then inspect the DOM tree. RE: web scraping with selenium and bs4 - Prince_Bhatia - Sep-18-2018 error is coming at line 79 which is this: geta = i.find_elements_by_tag_name("a")[1]so this is how it should work: main page has links , click the links, open url, grab url and then close the url and go to main page and start process again |