Random Loss of Control of Website When Scraping

bmccollum · (This post was last modified: Aug-30-2019, 10:21 AM by Larz60+.)

I've been writing a Python script to scrape a site. I'm accessing & controlling the site using Selenium.

Here's an example of a typical page on the site:

https://www.dibbs.bsm.dla.mil/Awards/Awd...08-16-2019

You'll prob. get prompted with a warning initially when trying to hit the above page, but it's 100% safe. It's a government site listing info. on parts for aircraft. Just click that you're o.k. with proceeding on to the site and the above page should render for you.

The problem I'm having with my script is that the script will successfully navigate 50 records on a page, will allow for a Javascript "doPostBack" command to be send to navigate to the next page, will successfully scrape the 50 records on that page, will allow for another Javascript "doPostBack" command to be sent to land us on the next page, and so forth... but this process eventually breaks after a very random # of pages and seemingly can no longer navigate to any upcoming pages.

For example I've seen the scrape successfully navigate and scrape 28 pages out of a possible 120 pages and seemingly be unable to navigate from page 28 to 29.

I've re-run the scrape and then seen the scrape successfully navigate/scrape 73 pages out of a possible 120 pages and then same thing... the Javascript "doPostBack" command absolutely will not allow for the navigation to page 74.

I've re-run / re-started the scrape again and have seen it then successfully navigate through all 120 pages without any issue.

I ca then re-run it again and it might navigate up through 95 pages out of 120 total pages and be unable to navigate on to page 96.

Etc... you get my drift... very randomly seems to hit a point with the page navigation attempts and the Python script simply cannot provide for continued navigation to the remaining pages, with the navigation attempts just leaving the page stuck on whatever current page it's on.

I've tried every trick under the sun that I've read about online without any definitive resolution.

It's as if I've lost all control of the page from Python at a random point during each scrape attempt and the page simply won't respond to the same Javascript "doPostBack" command that it successfully responded to many times over and over up until that point.

The only thing I can think of to possibly attempt at this point is, when I've somehow determined that a page navigation attempt did *NOT* result in successful page navigation on to the next intended page and that the page is still "stuck" on the same page without moving on to the next page of 50 records, is to just issue a "driver.quit()" command in Python to close completely out of the existing browser session, re-instantiate a new instance of the browser using Python/Selenium, re-open the desired page, and send a Javascript "doPostBack" command to try to take me directly to the page right after where the last attempt left off before being unable to continue with successful page navigation. This is just about the only way I can think of do try to resolve this... closing completely out of the browser using "driver.quit()" once it's evident that I'm no longer successfully navigating to the next page, reopen a brand new browser instance, and send a new Javascript "doPostBack" command to try to navigate directly to the next page of 50 records where I left off, see how many additional pages I can navigate, and if it stalls out again, repeat the "browser.quit()" command and reopen a new instance of the browser however many times needed until the Python script has successfully navigated through all remaining pages.

Any thoughts or suggestions are greatly appreciated, as the page navigation issues are very much a show-stopper for me if I can't get the page navigation to function consistently somehow.

Also, this isn't a one-time scrape... it's a scrape that will ultimately be executed once every single day.

Thanks in advance!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	web scraping for new additions/modifed website?	kingoman123	4	2,239	Apr-14-2022, 04:46 PM Last Post: snippsat
	Scraping lender data from Ren Ren Dai website using Python. I will pay for that 200$	Hafedh_2021	1	2,753	May-18-2021, 08:41 PM Last Post: snippsat
	Scraping all website text using Python	MKMKMKMK	1	2,082	Nov-26-2020, 10:35 PM Last Post: Larz60+
	Scraping a Website (HELP)	LearnPython2	1	1,748	May-08-2020, 03:20 PM Last Post: Larz60+
	scraping from a website that hides source code	PIWI_Protein	1	1,959	Mar-27-2020, 05:08 PM Last Post: Larz60+
	Scraping not moving to the next pages in a website	jithin123	0	1,944	Mar-23-2020, 06:10 PM Last Post: jithin123
	MaxRetryError while scraping a website multiple times	kawasso	6	17,418	Aug-29-2019, 05:25 PM Last Post: kawasso
	scraping multiple pages of a website.	Blue Dog	14	22,400	Jun-21-2018, 09:03 PM Last Post: Blue Dog
	Scraping number in % from website	santax	3	4,467	Mar-19-2017, 12:22 PM Last Post: santax

Random Loss of Control of Website When Scraping

User Panel Messages

Announcements