Dec-25-2019, 01:35 PM
I have a python scrapper with selenium for scrapping a dynamically loaded javascript website.
Scrapper by itself works ok but pages sometimes fail to load with 404 error.
Problem is that public http doesn't have data I need but loads everytime and javascript http with data I need sometimes won't load for a random time.
Even weirder is that same javascript http loads in one browser but not in another and vice versa.
I tried webdriver for chrome, firefox, firefox developer edition and opera. Not a single one loads all pages every time.
Public link that doesn't have data I need looks like this: https://www.sazka.cz/kurzove-sazky/fotbal/*League*/.
Javascript link that have data I need looks like this https://rsb.sazka.cz/fotbal/*League*/.
On average from around 30 links, about 8 fail to load although in different browsers that same link at the same time loads flawlessly.
I tried to search in page source for some clues but I found nothing.
Can anyone help me find out where might be a problem? Thank you.
Edit: here is my code that i think is relevant
Edit2: You can reproduce this problem by right-clicking on some league and try to open link in another tab. Then can be seen that even that page at first loaded properly after opening it in new tab it changes start of http link from https://www.sazka.cz to https://rsb.sazka.cz and sometimes gives 404 error that can last for an hour or more.
Scrapper by itself works ok but pages sometimes fail to load with 404 error.
Problem is that public http doesn't have data I need but loads everytime and javascript http with data I need sometimes won't load for a random time.
Even weirder is that same javascript http loads in one browser but not in another and vice versa.
I tried webdriver for chrome, firefox, firefox developer edition and opera. Not a single one loads all pages every time.
Public link that doesn't have data I need looks like this: https://www.sazka.cz/kurzove-sazky/fotbal/*League*/.
Javascript link that have data I need looks like this https://rsb.sazka.cz/fotbal/*League*/.
On average from around 30 links, about 8 fail to load although in different browsers that same link at the same time loads flawlessly.
I tried to search in page source for some clues but I found nothing.
Can anyone help me find out where might be a problem? Thank you.
Edit: here is my code that i think is relevant
Edit2: You can reproduce this problem by right-clicking on some league and try to open link in another tab. Then can be seen that even that page at first loaded properly after opening it in new tab it changes start of http link from https://www.sazka.cz to https://rsb.sazka.cz and sometimes gives 404 error that can last for an hour or more.
driver = webdriver.Chrome(executable_path='chromedriver', service_args=['--ssl-protocol=any', '--ignore-ssl-errors=true']) driver.maximize_window() for single_url in urls: randomLoadTime = random.randint(400, 600)/100 time.sleep(randomLoadTime) driver1 = driver driver1.get(single_url) htmlSourceRedirectCheck = driver1.page_source # Redirect Check redirectCheck = re.findall('404 - Page not found', htmlSourceRedirectCheck) if '404 - Page not found' in redirectCheck: leaguer1 = single_url leagueFinal = re.findall('fotbal/(.*?)/', leaguer1) print(str(leagueFinal) + ' ' + '404 - Page not found') pass else: try: loadedOddsCheck = WebDriverWait(driver1, 25) loadedOddsCheck.until(EC.element_to_be_clickable \ ((By.XPATH, ".//h3[contains(@data-params, 'hideShowEvents')]"))) except TimeoutException: pass unloadedOdds = driver1.find_elements_by_xpath \ (".//h3[contains(@data-params, 'loadExpandEvents')]") for clicking in unloadedOdds: clicking.click() randomLoadTime2 = random.randint(50, 100)/100 time.sleep(randomLoadTime2) matchArr = [] leaguer = single_url htmlSourceOrig = driver1.page_source