Python Forum

Full Version: Webscrapping sport betting websites
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello everyone,

As part of my improvement in Python, I want to scrap some sport betting websites and try to compare the odds.
I have few websites on my mind and for 2 of them, I was able to find a websocket or a js to get the datas I wanted.

For others, there are no "easy" informations to get.

For example, let's take that website : https://www.betclic.com/en/sports-betting/football-s1
It's all the soccer games available on their website at the moment.
We have few informations available and we have "Match Results" odds and "Double Chance" odds.
And if we click on one game, we have more bets available.

My goal here is to get all the bets available per game, without the need to "enter" in every game.
I don't want to do something like that :
for game in all_games:
    url = game[url]
    request(url)
    //get all odds
Doing that means doing 500 requests, which I would like to avoid.
I would prefer to have "one" request with all odds.

I already seached if someone else had already done that, but I guess not.
The only thing I found was a xml file (http://xml.cdn.betclic.com/odds_en.xml) which look kinda of what I want, but heavy (10Mo, my internet speed can be slow sometimes) and the odds can be different between the xml file and the website.

I was wondering if there is an equivalent of the file but directly on the betclic's website.

Maybe there is nothing an I will scrap the XML file Smile

Thanks in advance for your help.
(Mar-19-2022, 10:07 AM)KoinKoin Wrote: [ -> ]I was wondering if there is an equivalent of the file but directly on the betclic's website.
So betclic has an API but not open to public as in usage/documentation,there is an old GitHub account.
Often so do not these betting sites not want there sites to scraped for free(or at all),so there API is usually behind some paid wall.
odds-api has data from betclic any many more 500 requests per month is free.
There is link that can find if look at network data,that has json return and not the big xml(with all).
If i do quick test and parse something and the data is live so need to re run for update.
import requests
from requests.structures import CaseInsensitiveDict

url = "https://offer.cdn.begmedia.com/api/pub/v4/events?application=1&countrycode=no&fetchMultipleDefaultMarkets=true&hasSwitchMtc=true&language=en&limit=40&offset=0&sitecode=gben&sortBy=ByLiveRankingPreliveDate&sportIds=1"

headers = CaseInsensitiveDict()
headers["accept"] = "application/json"
headers["user-agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36"
headers["accept-language"] = "nb-NO,nb;q=0.9,no;q=0.8,nn;q=0.7,en-US;q=0.6,en;q=0.5"
resp = requests.get(url, headers=headers)
print(resp.status_code)
>>> resp.json()[0]['name']
'Osasuna - Levante'
>>> resp.json()[0]['liveData']
{'isEnded': False,
 'liveId': 3001236058,
 'refreshDelay': 0,
 'scoreboard': {'beginDate': '2022-03-19T18:05:00.764861Z',
                'cardActionsForTeam1': [],
                'cardActionsForTeam2': [],
                'elapsedTime': 1959,
                'endedPeriodScores': [],
                'isRestricted': False,
                'liveDisplayStatus': 0,
                'liveId': 3001236058,
                'liveIdOld': -1293731238,
                'period': 2,
                'periodFullTime': 45,
                'periodScore': {'periodIndex': 0,
                                'periodType': 15494,
                                'score': {'contestant1': 0, 'contestant2': 0}},
                'periodType': 15494,
                'scoreboardType': 7,
                'scorersForTeam1': [],
                'scorersForTeam2': [],
                'sportId': 1,
                'totalScore': {'contestant1': 0, 'contestant2': 0}}}
Hello snippsat, thanks for your reply.
You're right, in the network data, I already saw that json, unfortunately, you only have 2 bets available :
"Match Result" and "Double Chance".
There are other types of bets available per games that I would like to extract the odds.

It might be not possible other than the XML file I found or go inside each game to get everything (so for 500 games, I need 500 requests Tongue ).

I was just wondering if some people might have other ideas :)
One way to optimize this is to explore if they have an API or some structured data source that provides the odds for multiple games in a single request.
As for the XML file, it could be a potential workaround, but understandably, there might be better solutions due to its size and potential discrepancies.
I wish I had a specific solution to offer, but sometimes, these websites don't make it easy for scrapers. Keep exploring and experimenting, and you might come across a more efficient way to gather the data you need.