Python Forum

Full Version: selenium wait for text question
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am trying to wait for certain text to show up in the web page after I click on the 'Next' button.
if pageno == 1, the code works as expected.
when pageno != 1, I don't know how to wait for the <li tag's value to be equal to 'expected_text' which assures that
the page I am loading is fully loaded before continuing.
The page does indeed actually get loaded, but since I am only waiting for the ".pageinfo" css selector, which is already there from the previous page,
code continues too soon. How do I get it to wait for the text of <li tag to be equal to 'expected_text'?

code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from pathlib import Path
import os
import sys


class ByCSS_SELECTOR:
    def __init__(self) -> None:
        os.chdir(os.path.abspath(os.path.dirname(__file__)))
        self.HomePath = Path(".")

        self.NewHampshireBusinessListing = 'https://quickstart.sos.nh.gov/online/BusinessInquire/LandingPageBusinessSearch'
        self.browser = None
        self.browser_running = False
        self.page = None
    
    def find_page(self, pageno):
        if not self.browser_running:
            self.start_browser()

        if pageno == 1:
            self.browser.get(self.NewHampshireBusinessListing)
            try:
                elements = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(self.browser, 5). \
                    until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".pageinfo")))]
                print(f"found: {elements}")
                self.page = self.browser.page_source
            except TimeoutException:
                print("Query timed out")
        else:
            self.browser.find_element(By.CSS_SELECTOR, "li.next:nth-child(9) > a:nth-child(1)").click()
            try:
                expected_text = "Page {pageno} of"
                elements = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(self.browser, 5). \
                    until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".pageinfo")))]
                print(f"Length elements: {len(elements)}")
                print(f"elements: {elements}")
                self.page = self.browser.page_source
            except TimeoutException:
                print("Query timed out")

    def start_browser(self):
        caps = webdriver.DesiredCapabilities().FIREFOX
        caps["marionette"] = True
        self.browser = webdriver.Firefox(capabilities=caps)
        self.browser_running = True

    def stop_browser(self):
        self.browser.close()
        self.browser_running = False


def main():
    bcs = ByCSS_SELECTOR()
    bcs.start_browser()
    bcs.find_page(1)
    bcs.find_page(2)
    if bcs.browser_running:
        bcs.stop_browser()


if __name__ == '__main__':
    main()
I am a beginner to Python and Selenium but if JavaScript is updating the page then you probably need to wait for a DocumentComplete event. I am not sure (I forget) what event fires at the beginning of the download that results in a DocumentComplete event but perhaps that will help you find what you need.
The problem is that Selenium can't catch redirects via running script or AJAX, it doesn't wait for them to finish.
In addition, you can't catch them via readyState, it waits for a while, but will signal complete long before AJAX content is downloaded.
The most reliable way to know, is to check for the line where Next button is located.
It changes for each loaded page, which I know in advance, thus the expected_text = "Page {pageno} of" statement. (line 37)
This text will be inserted into the HTML after the page has been fetched, contained in the
<li class="pageinfo">
    Page 1 of 28581, records 1 to 25 of 714504
</li>
and therefore is what I want to wait for.
All i need to identify, is the 'Page n' part.
My question was how to wait specifically for that text value in the <li tag.
BUMP