Python Forum
Thread Rating:
  • 2 Vote(s) - 3.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
selenium timeout
#1
i have a selenium program that is archiving the forum of the last 1500 threads. At around 818 threads it timeouts. The method ran to archive each thread gets ran for each URL is...
    def archive_url(self, url):
        self.browser.get('https://web.archive.org/')
        WebDriverWait(self.browser, 10).until(EC.presence_of_element_located((By.ID,"web_save_div")))
        self.browser.find_element_by_xpath("/html/body/div[3]/div/div[3]/div/div[2]/div[3]/div[2]/form/input").click()
        self.browser.find_element_by_class_name('web-save-url-input').send_keys(url)
        self.delay()
        self.browser.find_element_by_xpath('/html/body/div[3]/div/div[3]/div/div[2]/div[3]/div[2]/form/button').click()
        WebDriverWait(self.browser, 10).until(EC.presence_of_element_located((By.ID,"wmtbURL")))
        print(f'Archived: {url}')
Error:
Traceback (most recent call last): File "archive_forum.py", line 213, in <module> File "archive_forum.py", line 177, in __init__ def archive_url(self, url): File "archive_forum.py", line 187, in archive_url File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/support/wait.py", line 80, in until raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message:
At first i was assuming the wait needs to be longer than 10 seconds, but it consistently timesout only after doing 818-ish threads.
Recommended Tutorials:
Reply
#2
I usually set the timeout of WebDriverWait to 50 seconds, although 10 should be sufficient,
it sometimes is not, and as soon as condition is True, the wait ends anyway.
Reply
#3
i still get a timeout with 50 seconds. It only takes about 6 seconds between each url to save though. I think ill just go back to time sleep as that was working perfectly fine. What is weird was the time.sleep was only for 1.5 seconds

EDIT:
The one that i got last time was for this one
Quote:WebDriverWait(self.browser, 50).until(EC.presence_of_element_located((By.ID,"wmtbURL")))
which is after archive is already done. I guess i probably wont even need this as it replaced a time.sleep(1.5) and doesnt really need to wait as it is already over.
Recommended Tutorials:
Reply
#4
Quote:What weird was the time.sleep was only for 1.5 seconds
That is weird, I suppose if you wanted to dig, you could find out why, but not worth the effort.
Reply
#5
well i guess i take that back. I get a timeout error with just time.sleep too
Error:
Traceback (most recent call last): File "archive_forum.py", line 208, in <module> App() File "archive_forum.py", line 175, in __init__ self.archive_url(url) File "archive_forum.py", line 184, in archive_url self.browser.find_element_by_xpath('/html/body/div[3]/div/div[3]/div/div[2]/div[3]/div[2]/form/button').click() File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webelement.py", line 78, in click self._execute(Command.CLICK_ELEMENT) File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webelement.py", line 499, in _execute return self._parent.execute(command, params) File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 297, in execute self.error_handler.check_response(response) File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message: timeout (Session info: headless chrome=63.0.3239.108) (Driver info: chromedriver=2.33.506092 (733a02544d189eeb751fe0d7ddca79a0ee28cce4),platform=Linux 4.4.0-141-generic x86_64)
Recommended Tutorials:
Reply
#6
It looks like it's not seeing the By.ID,"wmtbURL"
I'll try pulling it up in debugger as see where it's hanging (maybe)


I'm wondering if you are running into an issue that I recently encountered where the buttons that need to be clicked are off the
page, and need to be scrolled to. Here's some working code where that's exactly what happened, and the only indication I had was timeout

This might not have anything at all to do with the issue. I struggled with this for a couple of days before realizing what was going on.

code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.common import exceptions
from itertools import permutations
from bs4 import BeautifulSoup
import BusinessPaths
import time
import PrettifyPage
import string
import sys


class GetArkansas:
    def __init__(self):
        self.bpath = BusinessPaths.BusinessPaths()
        self.pp = PrettifyPage.PrettifyPage()

        caps = webdriver.DesiredCapabilities().FIREFOX
        caps["marionette"] = True
        self.browser = webdriver.Firefox(capabilities=caps)

        self.get_all_data()

    def get_all_data(self):
        end = False
        # 3 characters satisfies minimum search requirements
        al = f'{string.ascii_uppercase}-&0123456789 '
        for letter1 in al:
            if end:
                break
            for letter2 in al:
                if end:
                    break
                for letter3 in al:
                    sitem = f'{letter1}{letter2}{letter3}'
                    print(f'initial value: {sitem}')
                    if sitem == 'BAA':
                        end = True
                        break
                    alph = [''.join(p) for p in permutations(sitem)]
                    for entry in alph:
                        self.get_data(entry)
        self.browser.close()

    def get_data(self, searchitem):
        mainfilename = self.bpath.htmlpath / f'mainpage_{searchitem}.html'
        if mainfilename.exists():
            return None

        arkansas_url = 'https://www.sos.arkansas.gov/corps/search_all.php'
        self.browser.get(arkansas_url)
        time.sleep(2)
        mainsrc = self.browser.page_source
        soup = BeautifulSoup(mainsrc,"lxml")
        with mainfilename.open('w') as fp:
            fp.write(self.pp.prettify(soup, 2))

        # This gets first page
        search_box = self.browser.find_element(By.XPATH, '/html/body/div[2]/div/div[2]/div/form/table/tbody/tr[4]/td[2]/font/input')
        search_box.clear()
        search_box.send_keys(searchitem)
        self.browser.find_element(By.XPATH, '/html/body/div[2]/div/div[2]/div/form/table/tbody/tr[11]/td/font/input').click()
        time.sleep(3)
        src = self.browser.page_source
        if 'There were no records found!' in src:
            print(f'There are no records for {searchitem}')
            return None
        print(f'got page 1 - {searchitem}')
        soup = BeautifulSoup(src,"lxml")
        filename = self.bpath.htmlpath / f'results_{searchitem}page1.html'
        with filename.open('w') as fp:
            fp.write(self.pp.prettify(soup, 2))
        page = 2

        while True:
            try:
                height = self.browser.execute_script("return document.documentElement.scrollHeight")
                self.browser.execute_script("window.scrollTo(0, " + str(height) + ");")
                # Next line fails on third page!
                # mainContent > table:nth-child(5) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(3) > font:nth-child(1) > a:nth-child(1)
                # page_button_xpath = f'/html/body/div[2]/div/div[2]/div/table[4]/tbody/tr/td[{page}]/font/a'
                page_button_css = 'mainContent > table:nth-child(5) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(3) > font:nth-child(1) > a:nth-child(1)'
                next_page = self.browser.find_element(By.PARTIAL_LINK_TEXT, 'Next 250')
                ActionChains(self.browser).move_to_element(next_page)
                next_page.click()
                time.sleep(2)
                print(f'got page {page} - {searchitem}')
                src = self.browser.page_source
                soup = BeautifulSoup(src,"lxml")
                filename = self.bpath.htmlpath / f'results_{searchitem}page{page}.html'
                with filename.open('w') as fp:
                    fp.write(self.pp.prettify(soup, 2))
                page += 1
            except exceptions.NoSuchElementException:
                break
        # sys.exit(0)

if __name__ == '__main__':
    GetArkansas()
look at the garbage beginning here:
while True:
            try:
                height = self.browser.execute_script("return document.documentElement.scrollHeight")
This determines the height of the page and scrolls the page until buttons are visible, then moves to button and clicks
If you try to run this, I''ll either have to post the 2 external files BusinessPaths and PrettifyPage (or you'll have to comment out)
I think the scroll code is fairly straight forward, so you probably won't need the other files.
Reply
#7
no i mean that last error is with all webdriverwait replaced with time.sleep's
Recommended Tutorials:
Reply
#8
I've been coding since 3 A.M. and am getting weird. That's 20 hours straight. Time to quit, I don't think I'm making any sense!
Reply
#9
those are the best moments Hypnotic
Recommended Tutorials:
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Error in Selenium: CRITICAL:root:Selenium module is not installed...Exiting program. AcszE 1 3,618 Nov-03-2017, 08:41 PM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020