Python Forum

I write parser for https://www.oddsportal.com

See this url - https://www.oddsportal.com/soccer/englan...d-nNNqedbR

I faced with next problem. Need get urls from this block
[Image: RBcgO.png]

How I can get all absolute urls from this menu?
If it is a long time to write all urls, can write only url from "Home/Away":"2nd Half", for example.

I think, this urls forming by JS (and Ajax mb) and I don't know, how I can walk on the urls.
[Image: 3TZ6P.png]

def main(url):
    options = webdriver.ChromeOptions()
    options.add_argument('headless')
    driver = webdriver.Chrome(chrome_options=options)
    driver.get(url)

def get_url():
    base_url = 'https://www.oddsportal.com/soccer/england/premier-league/wolves-newcastle-utd-nNNqedbR'
    for i in ???:
        first_part = ???
        second_part = ???
        url = base_url + '#' + first_part + ';' + 'second_part'
        main(url)

Your functions setup in is wrong,just drop functions for now,if you unsure how they work.
It's a really messy site to deal with,so not the easiest to start with if new to this.

To show a way to get values from first line,it can also easier to send browser.page_source to BS for parsing.
Turn of headless under testing.

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time

#--| Setup
chrome_options = Options()
#chrome_options.add_argument("--headless")
#chrome_options.add_argument('--disable-gpu')
#chrome_options.add_argument('--log-level=3')
browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe')
#--| Parse or automation
browser.get('https://www.oddsportal.com/soccer/england/premier-league/wolves-newcastle-utd-nNNqedbR#1X2;4')
# Give source code to BeautifulSoup
soup = BeautifulSoup(browser.page_source, 'lxml')
time.sleep(3)
table_first_line = soup.select('#odds-data-table > div > table > tbody > tr:nth-of-type(1)')
print(table_first_line[0].text.strip())
browser.quit()

Get all values but need some clean up(white-space).

Output:
bet-at-home  2.052.404.8490.0%

Look at Web-scraping part-2.

(Feb-03-2019, 07:33 PM)snippsat Wrote: [ -> ]Your functions setup in is wrong,just drop functions for now,if you unsure how they work.

Thanks for the answer. But I need to get a list of url addresses. I know, how to get text from cells...

And I need a function so that I can run it for each url in a for loop.

(Feb-03-2019, 07:33 PM)snippsat Wrote: [ -> ]Your functions setup in is wrong,just drop functions for now,if you unsure how they work.
It's a really messy site to deal with,so not the easiest to start with if new to this.

To show a way to get values from first line,it can also easier to send browser.page_source to BS for parsing.

Hi. I wrote the code on the pure lxml and it works faster than yours.

Yes. It's a really messy site to deal with,so not the easiest to start with if new to this, many pitfalls, but then others will be easy.

page = browser.page_source
time.sleep(3)

doc = lxml.html.fromstring(page)
row = doc.cssselect("tr.lo")[0]
print(row.text_content().strip())

m0ntecr1st0

snippsat

m0ntecr1st0

m0ntecr1st0