Python Forum

Full Version: Extract data from sports betting sites
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi, as the title suggests, I would like to extract data from sports betting sites, with this code I download the html of the site
from bs4 import BeautifulSoup
import urllib.request

url = "https://sports.bwin.com/"
try:
    page = urllib.request.urlopen(url)
except:
    print("An error occured.")

soup = BeautifulSoup(page, 'html.parser')
print(soup)
then I can't go on, if I want to extract the participating teams I tried with the re module and this code but it doesn't work
from bs4 import BeautifulSoup
import urllib.request
import re
url = "https://sports.bwin.com/"
try:
    page = urllib.request.urlopen(url)
except:
    print("An error occured.")

soup = BeautifulSoup(page, 'html.parser')
#print(soup)

regex = re.compile("participant")
content_lis = soup.find_all('div', attrs={'class': regex})
print(content_lis)
thank you who will help me
That site uses a lot of javascript.
You will need to use selenium to expose the javascript, after you do this, you can finish with Beautiful Soup
Or do it all in selenium
there are good selenium tutorial within the web scraping tutorial on this forum
see:
web scraping part 1
web scraping part 2
I'm working on a similar Project, Selenium will easily do that work for you especially if you encounter a site using Ajax
The tutorial links in post #2 are still quick and valid.
They apply to all types of sites, including sports sites.