Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 How do i scrape website whose page changes using javsacript _dopostback function and
i am scraping a website who has multiple pages and using javascript like this one

<td><a href="javascript:__doPostBack('gv_AgentList1','Page$2')">2</a></td><td><a href="javascript:__doPostBack('gv_AgentList1','Page$3')">3</a></td>
here they change pages and this effect reflect in the __EVENTARGUMENT in the DOM like this
I tried to loop over it but received same first page results multiple times. Can anyone help me.Below are my code:

from bs4 import BeautifulSoup
import requests
import csv
import sqlite3

url = ""
final_data = []

def getdatabyget(url,values):
    res = requests.get(url,values)
    text = res.text
    return text

def readheaders():
    global url, final_data
    for i in range(1, 4):
        argument =  "Page$"+ str(i+1)
        htmldata = getdatabyget(url, {})
        soup  = BeautifulSoup(htmldata, "html.parser")
        VIEWSTATE ="#__VIEWSTATE")[0]['value']
        headers= {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
                  'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0'}

        formfields = {"__ASYNCPOST":"true",
        s = requests.session()
        res =, data=formfields, headers=headers).text
        soup = BeautifulSoup(res, "html.parser")
        data = soup.find_all("table")[0]
        gettr = data.find_all("tr")[1:-2]
        for i in gettr:
            add_list = []
            blank = ""
            projectname = i.find_all("td")[0].text
            reranumber = i.find_all("td")[1].text.replace(" ","")
            Authorised = i.find_all("td")[2].text.replace("\n","")
            promoternme = i.find_all("td")[3].text.replace("\n","")
            projecttype = i.find_all("td")[4].text.replace("\n","")
            district = i.find_all("td")[5].text.replace("\n","")
            tehsil = i.find_all("td")[6].text.replace("\n","")
            approveddate = i.find_all("td")[7].text.replace("\n","")
            enddate = i.find_all("td")[8].text.replace("\n","")

The above is the code. How can i solve this matter . Please do enlight.
Selenium and some headless browser - Phantomjs, Firefox, Chrome.
You can see how is working here:

You can use the webdriver class to scrape the site or bs4 and Selenium just to get the content. Take a look :)
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Django Two blocks of dynamic content on one page iFunKtion 5 433 Jul-04-2019, 02:31 AM
Last Post: noisefloor
  download pdf file from website m_annur2001 1 133 Jun-21-2019, 05:03 AM
Last Post: j.crater
  Can't get method to scroll down page. caarsonr 5 264 Jun-20-2019, 09:14 PM
Last Post: caarsonr
  webscrapping links and then enter those links to scrape data kirito85 2 169 Jun-13-2019, 02:23 AM
Last Post: kirito85
  website development masoud_da 9 340 Jun-08-2019, 06:54 PM
Last Post: masoud_da
  page navigation & form filling rudolphyaber 0 196 Mar-13-2019, 06:31 PM
Last Post: rudolphyaber
  Scrape ASPX data with python... hoff1022 0 539 Feb-26-2019, 06:16 PM
Last Post: hoff1022
  Python + request from specific website - please help hoff1022 8 526 Feb-14-2019, 06:52 PM
Last Post: buran
  Sorting getting off, when I switch page Django 1.11 m0ntecr1st0 0 199 Feb-12-2019, 06:40 PM
Last Post: m0ntecr1st0
  Selenium Parsing (unable to Parse page after loading) oneclick 6 580 Jan-13-2019, 03:10 AM
Last Post: oneclick

Forum Jump:

Users browsing this thread: 1 Guest(s)