selenium timeout

***metulburr*** · Jan-13-2019, 09:42 PM

i have a selenium program that is archiving the forum of the last 1500 threads. At around 818 threads it timeouts. The method ran to archive each thread gets ran for each URL is...

    def archive_url(self, url):
        self.browser.get('https://web.archive.org/')
        WebDriverWait(self.browser, 10).until(EC.presence_of_element_located((By.ID,"web_save_div")))
        self.browser.find_element_by_xpath("/html/body/div[3]/div/div[3]/div/div[2]/div[3]/div[2]/form/input").click()
        self.browser.find_element_by_class_name('web-save-url-input').send_keys(url)
        self.delay()
        self.browser.find_element_by_xpath('/html/body/div[3]/div/div[3]/div/div[2]/div[3]/div[2]/form/button').click()
        WebDriverWait(self.browser, 10).until(EC.presence_of_element_located((By.ID,"wmtbURL")))
        print(f'Archived: {url}')

Error:Traceback (most recent call last):
  File "archive_forum.py", line 213, in <module>
  File "archive_forum.py", line 177, in __init__
    def archive_url(self, url):
  File "archive_forum.py", line 187, in archive_url
    
  File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:

At first i was assuming the wait needs to be longer than 10 seconds, but it consistently timesout only after doing 818-ish threads.

**Larz60+** · Jan-13-2019, 10:11 PM

I usually set the timeout of WebDriverWait to 50 seconds, although 10 should be sufficient,
it sometimes is not, and as soon as condition is True, the wait ends anyway.

***metulburr*** · (This post was last modified: Jan-13-2019, 11:55 PM by metulburr.)

i still get a timeout with 50 seconds. It only takes about 6 seconds between each url to save though. I think ill just go back to time sleep as that was working perfectly fine. What is weird was the time.sleep was only for 1.5 seconds

EDIT:
The one that i got last time was for this one

Quote:WebDriverWait(self.browser, 50).until(EC.presence_of_element_located((By.ID,"wmtbURL")))

which is after archive is already done. I guess i probably wont even need this as it replaced a time.sleep(1.5) and doesnt really need to wait as it is already over.

**Larz60+** · Jan-13-2019, 11:53 PM

Quote:What weird was the time.sleep was only for 1.5 seconds

That is weird, I suppose if you wanted to dig, you could find out why, but not worth the effort.

***metulburr*** · Jan-14-2019, 01:06 AM

well i guess i take that back. I get a timeout error with just time.sleep too

Error:Traceback (most recent call last):
  File "archive_forum.py", line 208, in <module>
    App()
  File "archive_forum.py", line 175, in __init__
    self.archive_url(url)
  File "archive_forum.py", line 184, in archive_url
    self.browser.find_element_by_xpath('/html/body/div[3]/div/div[3]/div/div[2]/div[3]/div[2]/form/button').click()
  File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webelement.py", line 78, in click
    self._execute(Command.CLICK_ELEMENT)
  File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webelement.py", line 499, in _execute
    return self._parent.execute(command, params)
  File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 297, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: timeout
  (Session info: headless chrome=63.0.3239.108)
  (Driver info: chromedriver=2.33.506092 (733a02544d189eeb751fe0d7ddca79a0ee28cce4),platform=Linux 4.4.0-141-generic x86_64)

**Larz60+** · (This post was last modified: Jan-14-2019, 02:43 AM by Larz60+.)

It looks like it's not seeing the By.ID,"wmtbURL"
I'll try pulling it up in debugger as see where it's hanging (maybe)

I'm wondering if you are running into an issue that I recently encountered where the buttons that need to be clicked are off the
page, and need to be scrolled to. Here's some working code where that's exactly what happened, and the only indication I had was timeout

This might not have anything at all to do with the issue. I struggled with this for a couple of days before realizing what was going on.

code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.common import exceptions
from itertools import permutations
from bs4 import BeautifulSoup
import BusinessPaths
import time
import PrettifyPage
import string
import sys


class GetArkansas:
    def __init__(self):
        self.bpath = BusinessPaths.BusinessPaths()
        self.pp = PrettifyPage.PrettifyPage()

        caps = webdriver.DesiredCapabilities().FIREFOX
        caps["marionette"] = True
        self.browser = webdriver.Firefox(capabilities=caps)

        self.get_all_data()

    def get_all_data(self):
        end = False
        # 3 characters satisfies minimum search requirements
        al = f'{string.ascii_uppercase}-&0123456789 '
        for letter1 in al:
            if end:
                break
            for letter2 in al:
                if end:
                    break
                for letter3 in al:
                    sitem = f'{letter1}{letter2}{letter3}'
                    print(f'initial value: {sitem}')
                    if sitem == 'BAA':
                        end = True
                        break
                    alph = [''.join(p) for p in permutations(sitem)]
                    for entry in alph:
                        self.get_data(entry)
        self.browser.close()

    def get_data(self, searchitem):
        mainfilename = self.bpath.htmlpath / f'mainpage_{searchitem}.html'
        if mainfilename.exists():
            return None

        arkansas_url = 'https://www.sos.arkansas.gov/corps/search_all.php'
        self.browser.get(arkansas_url)
        time.sleep(2)
        mainsrc = self.browser.page_source
        soup = BeautifulSoup(mainsrc,"lxml")
        with mainfilename.open('w') as fp:
            fp.write(self.pp.prettify(soup, 2))

        # This gets first page
        search_box = self.browser.find_element(By.XPATH, '/html/body/div[2]/div/div[2]/div/form/table/tbody/tr[4]/td[2]/font/input')
        search_box.clear()
        search_box.send_keys(searchitem)
        self.browser.find_element(By.XPATH, '/html/body/div[2]/div/div[2]/div/form/table/tbody/tr[11]/td/font/input').click()
        time.sleep(3)
        src = self.browser.page_source
        if 'There were no records found!' in src:
            print(f'There are no records for {searchitem}')
            return None
        print(f'got page 1 - {searchitem}')
        soup = BeautifulSoup(src,"lxml")
        filename = self.bpath.htmlpath / f'results_{searchitem}page1.html'
        with filename.open('w') as fp:
            fp.write(self.pp.prettify(soup, 2))
        page = 2

        while True:
            try:
                height = self.browser.execute_script("return document.documentElement.scrollHeight")
                self.browser.execute_script("window.scrollTo(0, " + str(height) + ");")
                # Next line fails on third page!
                # mainContent > table:nth-child(5) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(3) > font:nth-child(1) > a:nth-child(1)
                # page_button_xpath = f'/html/body/div[2]/div/div[2]/div/table[4]/tbody/tr/td[{page}]/font/a'
                page_button_css = 'mainContent > table:nth-child(5) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(3) > font:nth-child(1) > a:nth-child(1)'
                next_page = self.browser.find_element(By.PARTIAL_LINK_TEXT, 'Next 250')
                ActionChains(self.browser).move_to_element(next_page)
                next_page.click()
                time.sleep(2)
                print(f'got page {page} - {searchitem}')
                src = self.browser.page_source
                soup = BeautifulSoup(src,"lxml")
                filename = self.bpath.htmlpath / f'results_{searchitem}page{page}.html'
                with filename.open('w') as fp:
                    fp.write(self.pp.prettify(soup, 2))
                page += 1
            except exceptions.NoSuchElementException:
                break
        # sys.exit(0)

if __name__ == '__main__':
    GetArkansas()

look at the garbage beginning here:

while True:
            try:
                height = self.browser.execute_script("return document.documentElement.scrollHeight")

This determines the height of the page and scrolls the page until buttons are visible, then moves to button and clicks
If you try to run this, I''ll either have to post the 2 external files BusinessPaths and PrettifyPage (or you'll have to comment out)
I think the scroll code is fairly straight forward, so you probably won't need the other files.

***metulburr*** · Jan-14-2019, 02:24 AM

no i mean that last error is with all webdriverwait replaced with time.sleep's

**Larz60+** · Jan-14-2019, 03:01 AM

I've been coding since 3 A.M. and am getting weird. That's 20 hours straight. Time to quit, I don't think I'm making any sense!

***metulburr*** · Jan-14-2019, 03:21 PM

those are the best moments Hypnotic

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Error in Selenium: CRITICAL:root:Selenium module is not installed...Exiting program.	AcszE	1	3,618	Nov-03-2017, 08:41 PM Last Post: metulburr

selenium timeout

User Panel Messages

Announcements