Python Forum
How do i scrape website whose page changes using javsacript _dopostback function and
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How do i scrape website whose page changes using javsacript _dopostback function and
#1
i am scraping a website who has multiple pages and using javascript like this one

<td><a href="javascript:__doPostBack('gv_AgentList1','Page$2')">2</a></td><td><a href="javascript:__doPostBack('gv_AgentList1','Page$3')">3</a></td>
here they change pages and this effect reflect in the __EVENTARGUMENT in the DOM like this
__EVENTARGUMENT:Page$2
I tried to loop over it but received same first page results multiple times. Can anyone help me.Below are my code:

from bs4 import BeautifulSoup
import requests
import csv
import sqlite3

url = "https://rera.cgstate.gov.in/"
final_data = []

def getdatabyget(url,values):
    res = requests.get(url,values)
    text = res.text
    return text


def readheaders():
    global url, final_data
    for i in range(1, 4):
        argument =  "Page$"+ str(i+1)
        htmldata = getdatabyget(url, {})
        soup  = BeautifulSoup(htmldata, "html.parser")
        EVENTVALIDATION = soup.select("#__EVENTVALIDATION")[0]['value']
        VIEWSTATE = soup.select("#__VIEWSTATE")[0]['value']
        headers= {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
                  'Content-Type':'application/x-www-form-urlencoded',
                  'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0'}

        formfields = {"__ASYNCPOST":"true",
                      "__EVENTARGUMENT":argument,   
                      "__EVENTTARGET":"gv_AgentList1",  
                      "__EVENTVALIDATION":EVENTVALIDATION,
                      "__LASTFOCUS":"",
                      "__VIEWSTATE":VIEWSTATE,
                      "ApplicantType":"0",
                      "Button1":"Search",
                      "color_value":"0",
                      "District_Name":"0",
                      "DropDownList1":"0",
                      "DropDownList2":"0",
                      "DropDownList4":"0",
                      "DropDownList5":"0",
                      "group1":"on",
                      "hdnSelectedOption":"0",
                      "hdnSelectedOptionForContractor":"0",
                      "language_value":"0",
                      "Mobile":"",  
                      "Tehsil_Name":"0",
                      "TextBox1":"",    
                      "TextBox2":"",    
                      "TextBox3":"",    
                      "TextBox4":"",    
                      "TextBox5":"",    
                      "TextBox6":"",
                      "ToolkitScriptManager1":"appr1|Button1",
                      "txt_otp":"", 
                      "txt_proj_name":"",   
                      "txtRefNo":"",    
                      "txtRefNoForContractor":""}
        s = requests.session()
        res = s.post(url, data=formfields, headers=headers).text
        soup = BeautifulSoup(res, "html.parser")
        data = soup.find_all("table")[0]
        gettr = data.find_all("tr")[1:-2]
        for i in gettr:
            add_list = []
            blank = ""
            projectname = i.find_all("td")[0].text
            reranumber = i.find_all("td")[1].text.replace(" ","")
            Authorised = i.find_all("td")[2].text.replace("\n","")
            promoternme = i.find_all("td")[3].text.replace("\n","")
            projecttype = i.find_all("td")[4].text.replace("\n","")
            district = i.find_all("td")[5].text.replace("\n","")
            tehsil = i.find_all("td")[6].text.replace("\n","")
            approveddate = i.find_all("td")[7].text.replace("\n","")
            enddate = i.find_all("td")[8].text.replace("\n","")
            add_list.append(projectname)
            print(add_list)

readheaders()
The above is the code. How can i solve this matter . Please do enlight.
Reply
#2
Selenium and some headless browser - Phantomjs, Firefox, Chrome.
You can see how is working here:
https://python-forum.io/Thread-Web-Scraping-part-1
https://python-forum.io/Thread-Web-scraping-part-2

You can use the webdriver class to scrape the site or bs4 and Selenium just to get the content. Take a look :)
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  how to scrape page that works dynamicaly? samuelbachorik 0 682 Sep-23-2023, 10:38 AM
Last Post: samuelbachorik
  Flask run function in background and auto refresh page raossabe 2 7,240 Aug-20-2022, 10:00 PM
Last Post: snippsat
  Unable to Scrape Website muhamdasim 2 2,562 Dec-27-2021, 07:49 PM
Last Post: JohnRaven
  to scrape wiki-page: getting back the results - can i use pandas also apollo 2 2,602 Feb-09-2021, 03:57 PM
Last Post: apollo
Photo How do I scrape a web page? oradba4u 2 2,075 Dec-23-2020, 12:35 PM
Last Post: codeto
  how to scrape a website from a keyword list greenpine 2 2,326 Dec-04-2020, 03:50 PM
Last Post: greenpine
  Beautifulsoup doesn't scrape page (python 2.7) Hikki 0 1,950 Aug-01-2020, 05:54 PM
Last Post: Hikki
  use Xpath in Python :: libxml2 for a page-to-page skip-setting apollo 2 3,578 Mar-19-2020, 06:13 PM
Last Post: apollo
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 5,087 Mar-13-2020, 07:59 PM
Last Post: alkaline3
  Read url from CSV and Scrape website Prince_Bhatia 3 10,189 Jan-08-2020, 09:08 AM
Last Post: binaryanimal

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020