Python Forum
Scrapping javascript website with Selenium where pages randomly fail to load
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scrapping javascript website with Selenium where pages randomly fail to load
#1
I have a python scrapper with selenium for scrapping a dynamically loaded javascript website.
Scrapper by itself works ok but pages sometimes fail to load with 404 error.
Problem is that public http doesn't have data I need but loads everytime and javascript http with data I need sometimes won't load for a random time.
Even weirder is that same javascript http loads in one browser but not in another and vice versa.
I tried webdriver for chrome, firefox, firefox developer edition and opera. Not a single one loads all pages every time.
Public link that doesn't have data I need looks like this: https://www.sazka.cz/kurzove-sazky/fotbal/*League*/.
Javascript link that have data I need looks like this https://rsb.sazka.cz/fotbal/*League*/.
On average from around 30 links, about 8 fail to load although in different browsers that same link at the same time loads flawlessly.
I tried to search in page source for some clues but I found nothing.
Can anyone help me find out where might be a problem? Thank you.

Edit: here is my code that i think is relevant

Edit2: You can reproduce this problem by right-clicking on some league and try to open link in another tab. Then can be seen that even that page at first loaded properly after opening it in new tab it changes start of http link from https://www.sazka.cz to https://rsb.sazka.cz and sometimes gives 404 error that can last for an hour or more.

driver = webdriver.Chrome(executable_path='chromedriver', 
                               service_args=['--ssl-protocol=any', 
                               '--ignore-ssl-errors=true'])
driver.maximize_window()
for single_url in urls:   
    randomLoadTime = random.randint(400, 600)/100
    time.sleep(randomLoadTime)
    driver1 = driver
    driver1.get(single_url)  
    htmlSourceRedirectCheck = driver1.page_source

    # Redirect Check
    redirectCheck = re.findall('404 - Page not found', htmlSourceRedirectCheck)

    if '404 - Page not found' in redirectCheck:
        leaguer1 = single_url
        leagueFinal = re.findall('fotbal/(.*?)/', leaguer1)
        print(str(leagueFinal) + ' ' + '404 - Page not found')
        pass

    else:
        try:
            loadedOddsCheck = WebDriverWait(driver1, 25)
            loadedOddsCheck.until(EC.element_to_be_clickable \
            ((By.XPATH, ".//h3[contains(@data-params, 'hideShowEvents')]")))
        except TimeoutException:
                pass

        unloadedOdds = driver1.find_elements_by_xpath \
        (".//h3[contains(@data-params, 'loadExpandEvents')]")
        for clicking in unloadedOdds:
            clicking.click()
            randomLoadTime2 = random.randint(50, 100)/100
            time.sleep(randomLoadTime2)

        matchArr = []
        leaguer = single_url

        htmlSourceOrig = driver1.page_source
Reply


Messages In This Thread
Scrapping javascript website with Selenium where pages randomly fail to load - by JuanJuan - Dec-25-2019, 01:35 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Problem with scrapping Website giddyhead 1 1,756 Mar-08-2024, 08:20 AM
Last Post: AhanaSharma
  python web scrapping mg24 1 485 Mar-01-2024, 09:48 PM
Last Post: snippsat
  Scaping pages created by javascript mbizzl 1 1,583 Jul-17-2022, 10:01 PM
Last Post: Larz60+
  How can I ignore empty fields when scrapping never5000 0 1,461 Feb-11-2022, 09:19 AM
Last Post: never5000
  Suggestion request for scrapping html table Vkkindia 3 2,142 Dec-06-2021, 06:09 PM
Last Post: Larz60+
  web scrapping through Python Naheed 2 2,704 May-17-2021, 12:02 PM
Last Post: Naheed
  Website scrapping and download santoshrane 3 4,506 Apr-14-2021, 07:22 AM
Last Post: kashcode
  Using Python request without selenium on html form with javascript onclick submit but eraosa 0 3,265 Jan-09-2021, 06:08 PM
Last Post: eraosa
  Newbie help with lxml scrapping chelsealoa 1 1,937 Jan-08-2021, 09:14 AM
Last Post: Larz60+
  Scrapping Sport score laplacea 1 2,341 Dec-13-2020, 04:09 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020