I think the following should be included into the tutorial pertaining to selenium.
proper waiting instead of using time.sleep
Sometimes you need the browser to just wait while the page is loading, otherwise it will fail because the content is not yet loaded. Instead of arbitrarily waiting X number of seconds (time.sleep), you can use WebDriverWait to wait... lets say until the element you are looking for exists. Then you are not waiting longer than needed, or possibly too short and fail as well.
references
These are a list of convenience methods in selenium that are common to use to search for elements
These are a list of locating methods in selenium that are common to use to search for elements
You can find the definition of each expected support condition here.
more info:
https://selenium-python.readthedocs.io/waits.html
https://selenium-python.readthedocs.io/l...ments.html
performing key combos
Sometimes we want to perform key combinations to do things in the browser.
switching or opening tabs
Switching tabs is often used as selecting things may bring up data in a whole different tab. Thus we need to switch to and from these tabs.
This in the cases where pages do not load the entire page until you scroll such as facebook. This will scroll to the bottom of the page, let it wait to load the rest (via time.sleep be aware), and keep repeating until it is at the bottom. To make this more portable it is using time.sleep, but you can wait for a specific element in your website if needed to be faster.
Use a try and except to get you where you want to go
An easy way to test if Javacript is blocking you in the first place is to turn off javascript on your browser and reload the website. If what you are parsing is missing, then its a quick way to determine it is generated by javascript...requiring selenium. Another way is to check the javascript source code on the website regarding the element you are parsing. If there is a javascript call in the header, then you will need selenium to parse it.
Search for unique elements
Often you are parsing sites that do not want a bot to parse them. You need to find a unique element for the content you are parsing. If it does not have one, then search higher in the HTML for one to start a point of reference for the element you are looking for. Then work your way down further to the exact element. More often than not the ID us unique enough. By far the quickest way is to search for the xpath of the element. But note that this can change over time. Websites change over time and can break your code. You will need to update the code as the website changes.
proper waiting instead of using time.sleep
Sometimes you need the browser to just wait while the page is loading, otherwise it will fail because the content is not yet loaded. Instead of arbitrarily waiting X number of seconds (time.sleep), you can use WebDriverWait to wait... lets say until the element you are looking for exists. Then you are not waiting longer than needed, or possibly too short and fail as well.
from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC ... WebDriverWait(browser, 3).until(EC.presence_of_element_located((By.ID, 'global-new-tweet-button')))That this will wait for the presence of the element with the ID of "global-new-tweet-button". It will timeout after 3 seconds of not finding it. You can of course extend this timeout as needed. The presence of the element located and ID is not the only thing we can search for. Below is the list of built-in methods to search for elements based on circumstances and content.
references
These are a list of convenience methods in selenium that are common to use to search for elements
These are a list of locating methods in selenium that are common to use to search for elements
You can find the definition of each expected support condition here.
more info:
https://selenium-python.readthedocs.io/waits.html
https://selenium-python.readthedocs.io/l...ments.html
performing key combos
Sometimes we want to perform key combinations to do things in the browser.
from selenium.webdriver.common.action_chains import ActionChains from selenium.webdriver.common.keys import Keys ActionChains(browser).key_down(Keys.COMMAND).send_keys("s").key_up(Keys.COMMAND).perform()Where in this specific example in Firefox will execute Ctrl+S to bring up the save as menu.
switching or opening tabs
Switching tabs is often used as selecting things may bring up data in a whole different tab. Thus we need to switch to and from these tabs.
# Opens a new tab driver.execute_script("window.open()") # Switch to the newly opened tab driver.switch_to.window(driver.window_handles[1]) # Navigate to new URL in new tab driver.get("https://google.com") # Run other commands in the new tab here You're then able to close the original tab as follows # Switch to original tab driver.switch_to.window(driver.window_handles[0]) # Close original tab driver.close() # Switch back to newly opened tab, which is now in position 0 driver.switch_to.window(driver.window_handles[0]) Or close the newly opened tab # Close current tab driver.close() # Switch back to original tab driver.switch_to.window(driver.window_handles[0])scrolling to the bottom of the page regardless of length
This in the cases where pages do not load the entire page until you scroll such as facebook. This will scroll to the bottom of the page, let it wait to load the rest (via time.sleep be aware), and keep repeating until it is at the bottom. To make this more portable it is using time.sleep, but you can wait for a specific element in your website if needed to be faster.
def scroll_to_bottom(driver): #driver = self.browser SCROLL_PAUSE_TIME = 0.5 # Get scroll height last_height = driver.execute_script("return document.body.scrollHeight") while True: # Scroll down to bottom driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # Wait to load page time.sleep(SCROLL_PAUSE_TIME) # Calculate new scroll height and compare with last scroll height new_height = driver.execute_script("return document.body.scrollHeight") if new_height == last_height: break last_height = new_height #call scroll_to_bottom(browser) when you want it to scroll to the bottom of the pageHandle exceptions with built-in's:
Use a try and except to get you where you want to go
>>> import selenium.common.exceptions as EX >>> help(EX)
Output: builtins.Exception(builtins.BaseException)
WebDriverException
ErrorInResponseException
ImeActivationFailedException
ImeNotAvailableException
InvalidArgumentException
InvalidCookieDomainException
InvalidElementStateException
ElementNotInteractableException
ElementNotSelectableException
ElementNotVisibleException
InvalidSwitchToTargetException
NoSuchFrameException
NoSuchWindowException
MoveTargetOutOfBoundsException
NoAlertPresentException
NoSuchAttributeException
NoSuchElementException
InvalidSelectorException
RemoteDriverServerException
StaleElementReferenceException
TimeoutException
UnableToSetCookieException
UnexpectedAlertPresentException
UnexpectedTagNameException
Does the site use Javascript in the first place? An easy way to test if Javacript is blocking you in the first place is to turn off javascript on your browser and reload the website. If what you are parsing is missing, then its a quick way to determine it is generated by javascript...requiring selenium. Another way is to check the javascript source code on the website regarding the element you are parsing. If there is a javascript call in the header, then you will need selenium to parse it.
Search for unique elements
Often you are parsing sites that do not want a bot to parse them. You need to find a unique element for the content you are parsing. If it does not have one, then search higher in the HTML for one to start a point of reference for the element you are looking for. Then work your way down further to the exact element. More often than not the ID us unique enough. By far the quickest way is to search for the xpath of the element. But note that this can change over time. Websites change over time and can break your code. You will need to update the code as the website changes.
Recommended Tutorials: