Difficult web page -- Selenium - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Difficult web page -- Selenium (/thread-15039.html) |
Difficult web page -- Selenium - Larz60+ - Dec-31-2018 I am trying to scrape a web page that's giving me a bit of trouble. There is no issue loading first and second pages, but then the second page has a search box which has a different XPath each time called. I am able to find the search box by searching for the <input> tag on the page. The problem lies in trying to enter text into the search box and getting result. Here's an almost working (except for above issue) snippet the code: from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.by import By from selenium.webdriver.common.action_chains import ActionChains import time class GetOregonBusinessLists: def __init__(self): self.parse_BusinessLists(searchitem='Active Businesses LLC', savename='ActiveLLC.html') def parse_BusinessLists(self, searchitem, savename): mainurl = 'https://data.oregon.gov' caps = webdriver.DesiredCapabilities().FIREFOX caps["marionette"] = True browser = webdriver.Firefox(capabilities=caps) browser.get(mainurl) time.sleep(5) data_catalog = browser.find_element(By.XPATH, '/html/body/div[2]/div/div[5]/div/div[2]/div/div[1]/div[2]/div[2]/div/a/div/div/p[2]') hover = ActionChains(browser).move_to_element(data_catalog) hover.perform() data_catalog.click() time.sleep(5) # Fetching search box by -- tag name -- as xpath changes with each access inputElement = browser.find_element_by_tag_name('input') attrs = browser.execute_script('var items = {}; for (index = 0; index < arguments[0].attributes.length; ++index) { items[arguments[0].attributes[index].name] = arguments[0].attributes[index].value }; return items;', inputElement) print(f'attrs: {attrs}') inputElement.send_keys(searchitem) inputElement.send_keys(Keys.RETURN) page = str(browser.page_source) time.sleep(5) browser.close() print(page) if __name__ == '__main__': GetOregonBusinessLists()
RE: Difficult web page -- Selenium - stranac - Dec-31-2018 (Dec-31-2018, 05:18 PM)Larz60+ Wrote: the second page has a search box which has a different XPath each time calledSounds like you need a more generic xpath expression. One that might work (depending on how exact you want to be) might be //div[@class="siteContentWrapper"]//input .I'm no expert when it comes to selenium (I try to avoid it if possible), but I believe your code tries using the first input element on the page, which is a hidden one. The resulting search urls seem pretty straight-forward, so it would probably be easier to just generate them, instead of going through the trouble of automating the browser for searching. RE: Difficult web page -- Selenium - Larz60+ - Dec-31-2018 I didn't see the hidden input tag That did the trick! Thanks! |