Python Forum

Hi, as the title suggests, I would like to extract data from sports betting sites, with this code I download the html of the site

from bs4 import BeautifulSoup
import urllib.request

url = "https://sports.bwin.com/"
try:
    page = urllib.request.urlopen(url)
except:
    print("An error occured.")

soup = BeautifulSoup(page, 'html.parser')
print(soup)

then I can't go on, if I want to extract the participating teams I tried with the re module and this code but it doesn't work

from bs4 import BeautifulSoup
import urllib.request
import re
url = "https://sports.bwin.com/"
try:
    page = urllib.request.urlopen(url)
except:
    print("An error occured.")

soup = BeautifulSoup(page, 'html.parser')
#print(soup)

regex = re.compile("participant")
content_lis = soup.find_all('div', attrs={'class': regex})
print(content_lis)

thank you who will help me

That site uses a lot of javascript.
You will need to use selenium to expose the javascript, after you do this, you can finish with Beautiful Soup
Or do it all in selenium
there are good selenium tutorial within the web scraping tutorial on this forum
see:
web scraping part 1
web scraping part 2

I'm working on a similar Project, Selenium will easily do that work for you especially if you encounter a site using Ajax

The tutorial links in post #2 are still quick and valid.
They apply to all types of sites, including sports sites.

nestor

Larz60+

law

Larz60+