How can get url from JavaScript in Selenium (Python 3)? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: How can get url from JavaScript in Selenium (Python 3)? (/thread-15855.html) |
How can get url from JavaScript in Selenium (Python 3)? - m0ntecr1st0 - Feb-03-2019 I write parser for https://www.oddsportal.com See this url - https://www.oddsportal.com/soccer/england/premier-league/wolves-newcastle-utd-nNNqedbR I faced with next problem. Need get urls from this block How I can get all absolute urls from this menu? If it is a long time to write all urls, can write only url from "Home/Away":"2nd Half", for example. I think, this urls forming by JS (and Ajax mb) and I don't know, how I can walk on the urls. def main(url): options = webdriver.ChromeOptions() options.add_argument('headless') driver = webdriver.Chrome(chrome_options=options) driver.get(url) def get_url(): base_url = 'https://www.oddsportal.com/soccer/england/premier-league/wolves-newcastle-utd-nNNqedbR' for i in ???: first_part = ??? second_part = ??? url = base_url + '#' + first_part + ';' + 'second_part' main(url) RE: How can get url from JavaScript in Selenium (Python 3)? - snippsat - Feb-03-2019 Your functions setup in is wrong,just drop functions for now,if you unsure how they work. It's a really messy site to deal with,so not the easiest to start with if new to this. To show a way to get values from first line,it can also easier to send browser.page_source to BS for parsing.Turn of headless under testing. from selenium import webdriver from bs4 import BeautifulSoup from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys import time #--| Setup chrome_options = Options() #chrome_options.add_argument("--headless") #chrome_options.add_argument('--disable-gpu') #chrome_options.add_argument('--log-level=3') browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe') #--| Parse or automation browser.get('https://www.oddsportal.com/soccer/england/premier-league/wolves-newcastle-utd-nNNqedbR#1X2;4') # Give source code to BeautifulSoup soup = BeautifulSoup(browser.page_source, 'lxml') time.sleep(3) table_first_line = soup.select('#odds-data-table > div > table > tbody > tr:nth-of-type(1)') print(table_first_line[0].text.strip()) browser.quit()Get all values but need some clean up(white-space). Look at Web-scraping part-2.
RE: How can get url from JavaScript in Selenium (Python 3)? - m0ntecr1st0 - Feb-03-2019 (Feb-03-2019, 07:33 PM)snippsat Wrote: Your functions setup in is wrong,just drop functions for now,if you unsure how they work. Thanks for the answer. But I need to get a list of url addresses. I know, how to get text from cells... And I need a function so that I can run it for each url in a for loop. RE: How can get url from JavaScript in Selenium (Python 3)? - m0ntecr1st0 - Feb-19-2019 (Feb-03-2019, 07:33 PM)snippsat Wrote: Your functions setup in is wrong,just drop functions for now,if you unsure how they work. Hi. I wrote the code on the pure lxml and it works faster than yours. Yes. It's a really messy site to deal with,so not the easiest to start with if new to this, many pitfalls, but then others will be easy. page = browser.page_source time.sleep(3) doc = lxml.html.fromstring(page) row = doc.cssselect("tr.lo")[0] print(row.text_content().strip()) |