Python Forum

Full Version: scraping javascript websites with selenium
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello, I hope this is the right subforum for this. I'm trying to scrape a website like this https://www.rezultati.com/utakmica/OUNkA...jek-meca;1

Right now, the best I can do with the code is this - classical search by xpath and then get text.
data=[]
tables = driver.find_elements_by_xpath("//div[@id='match-history-content']/div[contains(@id, 'tab-mhistory-')]/table/tbody")
for table in tables:
	this_table = []
	for row in table.find_elements_by_xpath(".//tr"):
		this_row = []
		for one_element in row.find_elements_by_xpath(".//td"):
			this_row.append(one_element.get_attribute("innerText"))
		this_table.append(this_row)
	data.append(this_table)

# parse...
Is there an another way to scrape this by using requests to directly get some kind of json format? When I open "Networking" tab in Chrome, response for every request is some kind of javascript code. Is the thing I'm asking for even possible to do for this particular website, or any other? How hard would you say this is, maybe it's best for me to just keep on doing it like with the above code...

Thanks for your help!
Let me answer my own question. This seems to be the fastest way for scraping.

from lxml import html
# open some link and wait
data = []
innerHTML = driver.execute_script("return document.body.innerHTML")
htmlElem = html.document_fromstring(innerHTML)
tables = htmlElem.xpath("//div[@id='match-history-content']/div[contains(@id, 'tab-mhistory-')]/table/tbody")
for table in tables:
    this_table = []
    for row in table.xpath(".//tr"):
        this_row = []
        for elm in row.xpath(".//td"):
            this_row.append(elm.text_content())
        this_table.append(this_row)
    data.append(this_table)