Python Forum - Python Selenium WebDriver Problem

Hello,

I'm currently trying to extract some tables from a particular website, but I have to use the Selenium WebDriver to do so because I believe the page uses Javascript.

I thought I had found the solution, but the code works sporadically.

Sometimes it'll work no problem, other times it will time out after not being able to find the element id. (Even though I can see and inspect it in the browser)

http://www.basketball-reference.com/boxs...80CHO.html

from selenium import webdriver

from selenium.webdriver.common.by import By
import selenium.webdriver.support.ui as ui
import selenium.webdriver.support.expected_conditions as EC
import os

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')
dir_path = os.path.dirname(os.path.realpath(__file__))
chromedriver = dir_path + "/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chrome_options=options, executable_path=chromedriver)

url = 'http://www.basketball-reference.com/boxscores/201703280CHO.html'

driver.get(url)

ui.WebDriverWait(driver, 15).until(EC.visibility_of_element_located((By.ID, "line_score")))

find_table = driver.find_element_by_xpath("//table[@id='line_score']")

Can someone please help me find a concrete way to extract the "line_score" and "four_factors" tables?

I'm only about a week old in Python coding, but I've done a fair bit of research and can't seem to find a solution. From what I've read thus far, certain pages have characteristics where they're constantly re-loading when values change, and so that gets in the way of the element grabbing. Is this what's going on?

Thank you for your time.

http://www.basketball-reference.com/boxs...80CHO.html

It wouldn't let me post the url on my first post, so here it is

Moderator Larz60+: Added Python tags. Please do this in the future (see help, BBCODE)

(Apr-01-2017, 03:19 PM)SlpnGnt Wrote: [ -> ]Sometimes it'll work no problem, other times it will time out after not being able to find the element id. (Even though I can see and inspect it in the browser)

I would try to put a time.sleep(1) after you load the page to give it time to actually load the page. Depending on your internet speed at that point in time, if the program is too fast and for some reason your bandwidth is slow (example like friday nights when everyone is online taking bandwidth), the program will actually jump ahead before the data is loaded, and not find the data. If you still have a problem wtih a delay of 1 second i would increase it. I have some where the delay has to be 3 seconds to load the page for sure.

from selenium import webdriver
 
from selenium.webdriver.common.by import By
import selenium.webdriver.support.ui as ui
import selenium.webdriver.support.expected_conditions as EC
import os
import time
 
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')
dir_path = os.path.dirname(os.path.realpath(__file__))
chromedriver = dir_path + "/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chrome_options=options, executable_path=chromedriver)
 
url = 'http://www.basketball-reference.com/boxscores/201703280CHO.html'
 
driver.get(url)
 
ui.WebDriverWait(driver, 15).until(EC.visibility_of_element_located((By.ID, "line_score")))

time.sleep(1)

line_score_table = driver.find_element_by_xpath("//div[@id='all_line_score']")
print(line_score_table.text)

print('\n')

four_factors_table = driver.find_element_by_xpath("//div[@id='all_four_factors']")
print(four_factors_table.text)

Thanks for reply metulburr..

I actually tried using the time.sleep method but I was placing it right after the driver.get(url).

I just now put it after the WebDriverWait, after seeing your reply, and it's working so far, which is surprising and confusing to me...

I figured it was hanging at or before the WebDriverWait, not after it.

Either way, thank you for your help.