Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Difficult web page -- Selenium
#1
I am trying to scrape a web page that's giving me a bit of trouble.
There is no issue loading first and second pages, but then the
second page has a search box which has a different XPath each time called.

I am able to find the search box by searching for the <input> tag on the page.

The problem lies in trying to enter text into the search box and getting result.

Here's an almost working (except for above issue) snippet the code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
import time


class GetOregonBusinessLists:
    def __init__(self):
        self.parse_BusinessLists(searchitem='Active Businesses LLC', savename='ActiveLLC.html')
    
    def parse_BusinessLists(self, searchitem, savename):
        mainurl = 'https://data.oregon.gov'
        caps = webdriver.DesiredCapabilities().FIREFOX
        caps["marionette"] = True
        browser = webdriver.Firefox(capabilities=caps)
        browser.get(mainurl)
        time.sleep(5)
        data_catalog = browser.find_element(By.XPATH, '/html/body/div[2]/div/div[5]/div/div[2]/div/div[1]/div[2]/div[2]/div/a/div/div/p[2]')
        hover = ActionChains(browser).move_to_element(data_catalog)
        hover.perform()
        data_catalog.click()
        time.sleep(5)

        # Fetching search box by -- tag name -- as xpath changes with each access
        inputElement = browser.find_element_by_tag_name('input')
        attrs = browser.execute_script('var items = {}; for (index = 0; index < arguments[0].attributes.length; ++index) { items[arguments[0].attributes[index].name] = arguments[0].attributes[index].value }; return items;', inputElement)
        print(f'attrs: {attrs}')

        inputElement.send_keys(searchitem)
        inputElement.send_keys(Keys.RETURN)

        page =  str(browser.page_source)
        time.sleep(5)
        browser.close()
        print(page)


if __name__ == '__main__':
    GetOregonBusinessLists()
Error:
attrs: {'autocomplete': 'off', 'class': 'autocomplete-input _-_-_-common-autocomplete-components-SearchBox-_search-box-module_search-box-static-mobile_ksx6z', 'id': 'autocomplete-search-input-18721', 'placeholder': 'Search', 'type': 'search', 'value': ''} Traceback (most recent call last): File "/media/larz60/Data-1TB/Projects/BusinessListings/src/Oregon/ForForum.py", line 41, in <module> GetOregonBusinessLists() File "/media/larz60/Data-1TB/Projects/BusinessListings/src/Oregon/ForForum.py", line 11, in __init__ self.parse_BusinessLists(searchitem='Active Businesses LLC', savename='ActiveLLC.html') File "/media/larz60/Data-1TB/Projects/BusinessListings/src/Oregon/ForForum.py", line 31, in parse_BusinessLists inputElement.send_keys(searchitem) File "/media/larz60/Data-1TB/Projects/BusinessListings/business_venv/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py", line 479, in send_keys 'value': keys_to_typing(value)}) File "/media/larz60/Data-1TB/Projects/BusinessListings/business_venv/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py", line 633, in _execute return self._parent.execute(command, params) File "/media/larz60/Data-1TB/Projects/BusinessListings/business_venv/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py",line 321, in execute self.error_handler.check_response(response) File "/media/larz60/Data-1TB/Projects/BusinessListings/business_venv/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.ElementNotInteractableException: Message: Element <input id="autocomplete-search-input-18721" class="autocomplete-input _-_-_-common-autocomplete-components-SearchBox-_search-box-module_search-box-static-mobile_ksx6z" type="search"> is not reachable bykeyboar
Quote
#2
(Dec-31-2018, 05:18 PM)Larz60+ Wrote: the second page has a search box which has a different XPath each time called
Sounds like you need a more generic xpath expression.
One that might work (depending on how exact you want to be) might be //div[@class="siteContentWrapper"]//input.

I'm no expert when it comes to selenium (I try to avoid it if possible), but I believe your code tries using the first input element on the page, which is a hidden one.

The resulting search urls seem pretty straight-forward, so it would probably be easier to just generate them, instead of going through the trouble of automating the browser for searching.
Quote
#3
I didn't see the hidden input tag
That did the trick!
Thanks!
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  use Xpath in Python :: libxml2 for a page-to-page skip-setting apollo 2 255 Mar-19-2020, 06:13 PM
Last Post: apollo
  Selenium get data from newly accessed page hoff1022 2 348 Oct-09-2019, 06:52 PM
Last Post: hoff1022
  Selenium Parsing (unable to Parse page after loading) oneclick 6 1,035 Jan-13-2019, 03:10 AM
Last Post: oneclick
  Web Page not opening while web scraping through python selenium sumandas89 4 3,730 Nov-19-2018, 02:47 PM
Last Post: snippsat
  Saving html page and reloading into selenium while developing all xpaths Larz60+ 3 805 Sep-10-2018, 01:14 PM
Last Post: snippsat
  open a web page by selenium !! evilcode1 3 894 Aug-01-2018, 03:05 PM
Last Post: snippsat
  Error in Selenium: CRITICAL:root:Selenium module is not installed...Exiting program. AcszE 1 1,215 Nov-03-2017, 08:41 PM
Last Post: metulburr

Forum Jump:


Users browsing this thread: 1 Guest(s)