Web Scraping Sportsbook Websites

Khuber79 · Mar-23-2020, 03:33 PM

I have been out of coding for nearly 20 years now. I retired from programming to be a professional sports handicapper in my mid 20s. I used to code C++ and Visual Basic. Now I am revisiting my former profession to aid in my current profession. I was curious if in your opinion if you think Python would be the best language for me to dive into for the project that I have. Here is basically an outline of what I am looking to do:

I want to query several websites at a time in real-time and update Live Spreads and compare those to each other to find the biggest differential. For example FanDuel has the 76ers -3, DraftKings has them -5, and BetAmerica has them -4. The program would then let me know that Fanduel and DraftKings have the biggest differential which is 2 points. I would want to do this for four spreads per game (total, spread, total first half and spread first half) for every NCAA game and NBA game constantly as the game progresses. Perhaps entering a minimum differential of 2 or 3. There might be 30 games going at the same time which would translate to 120 spread comparisons.

Any tips to get me started would be greatly appreciated.

**Larz60+** · Mar-23-2020, 04:48 PM

OK, so do you think we might be biased here?
I would suggest that python is definitely the way to go

here are some packages that might be useful: https://pypi.org/search/?q=%27website+monitoring%27

Khuber79 · (This post was last modified: Mar-25-2020, 05:29 AM by Khuber79.)

Ok, I have gotten a long way in the past 24 hours thanks to YouTube. I am stuck trying to find a solution to my current issue. I am trying to find text after certain text in the HTML file. Not particularly after a tag because these websites populate the same way for every game. so for example, the tag I first search is <span class="KambiBC-mod-outcome__odds">. But that is the same for every bet that they have on a particular game. Using find_all and len() there are anywhere from 120 to 150 instances. The first two are important to me, but the other two that I need will be a variable amount further away. I am hoping to truncate the string into two sections or just remove the first part after I extracted my data and then move onto the second part. Split doesn't seem to work with the large string that I have. I am looking for all HTML after the world "Halftime" because once again I would find the first two elements using find_all. Or if there is any other solution that you can think of. To visualize code because my explanation isn't the clearest:

Sugarhouse_Line = Sugarhouse.find_all(class_='KambiBC-mod-outcome__odds')
Sugarhouse_Line items 0 and 1 are vital to me, the other parts are somewhere between 60 and 80 but change every game and will change throughout the game, the only other constant that I can find would be anything after the word Halftime, so I would either want to truncate to remove everything before the word Halftime or do a find_all after the word Halftime.

**Larz60+** · Mar-25-2020, 06:39 AM

please show your code so far.
also please provide the a few of the URLs you are trying to scrape.

Khuber79 · (This post was last modified: Mar-25-2020, 06:49 PM by Larz60+.)

Here is an example of a webpage I am trying to scrape:
https://pa.sportsbook.fanduel.com/sports/event/820552.3

I would like to get the data from after Neman Grodno (right now it is +135) but that might change. (I can get that no problem because it is the first instance so [0] of my array.

I would also like to get the data from after Neman Grodno under the dropdown Half-Time Result. This position will change from game to game depending on how many bet offers they have.

They both along with every other wager on the page is listed under the class 'selectionprice'

The easiest way that I could see is to setup a variable and populate it with everything after the Keyword in this case "Half-Time Result". Then parse that data again finding the class 'selectionprice' and it should be the first value in the array.

import requests
import csv
from bs4 import BeautifulSoup
import urllib.request
import random
import re
from selenium import webdriver
chrome_path = r"C:\Users\user\Desktop\chromedriver.exe"

HTML_Header = """
<HTML>
<HEAD>
<TITLE>NBA Comparisons</TITLE>
</HEAD>
<BODY>
<CENTER>
<H1>NBA Games</H1><BR>
<TABLE cellpadding=10 border=1>
<TR>
<TD col width=150><CENTER>Matchup</CENTER></TD>
<TD col width=150><CENTER>DraftKings</CENTER></TD>
<TD col width=150><CENTER>Fanduel</CENTER></TD>
<TD col width=150><CENTER>Caesars</CENTER></TD>
<TD col width=150><CENTER>BetAmerica</CENTER></TD>
</TR>
"""

HTML_Body = ""

HTML_Footer = "</TABLE></CENTER></BODY></HTML>"

Urls = []
Teams = ''

with open('M:\SportsBooks.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=',')
    for row in readCSV:
        Urls.append(row)

x = 0
y = len(Urls)
print(y)
z = 0
xyz=0

DK_web_FT = [None] * len(Urls)
DK_web_HT = [None] * len(Urls)
FD_web = [None] * len(Urls)


# Open all the webpages
while True:
    DK_web_FT[z] = webdriver.Chrome(chrome_path)
    DK_web_HT[z] = webdriver.Chrome(chrome_path)
    FD_web[z] = webdriver.Chrome(chrome_path)
    DK_web_FT[z].get(str(Urls[x + 1])[2:-2])
    DK_web_HT[z].get(str(Urls[x + 2])[2:-2])
    FD_web[z].get(str(Urls[x + 3])[2:-2])

    x += 4
    z += 1

    # Change to 3 for testing should be y
    if x >= y:
        break


x = 0
z = 0

# Bulk of the programming below to loop until it populates data for every game in the excel sheet.

while True:

    Teams = str(Urls[x])[2:-2]

    # Retrieve the data from the sites
    DK_Content_FT = DK_web_FT[z].page_source
    DK_Content_HT = DK_web_HT[z].page_source
    FD_Content = FD_web[z].page_source

    # Parse the data
    DK_FT = BeautifulSoup(DK_Content_FT, 'html.parser')
    DK_HT = BeautifulSoup(DK_Content_HT, 'html.parser')
    FD = BeautifulSoup(FD_Content, 'html.parser')

    #DK Lines
    DK_Full_Game_Line = DK_FT.find(class_='sportsbook-odds american default-color')
    DK_Half_Game_Line = DK_HT.find(class_='sportsbook-odds american default-color')
    
    #FD Lines
    FD_Full_Game_Line = FD.find(class_='selectionprice')

    # Add to the counter
    x += 4
    z += 1

    if x >= y:
        break

**Larz60+** · Mar-25-2020, 07:10 PM

I'm looking at this now.
how do you get the url https://pa.sportsbook.fanduel.com/sports/event/820552.3 ?
what would be the 'click' order if you were doing it manually from the website homepage?

Khuber79 · (This post was last modified: Mar-25-2020, 07:25 PM by Khuber79.)

I manually populate the csv file. With links to each game. Will be a bit of a hassle to setup each day, but the different sportsbooks have different naming conventions for each team. Would be difficult to match up in code. I did find the solution using the regex split:

    #Finding Halftime lines
    SplitFox = "First-Half Result"
    TestFox = re.split(SplitFox, Fox_Content)
    FoxHTFinal = BeautifulSoup(TestFox[2], 'html.parser')
    FoxHT_Unparsed = FoxHTFinal.find(class_='button__bet__odds')

However, I'm running into the issue now that the HTML doesn't appear for particular lines unless the DIV is activated by clicking on it. Trying to figure that part out now.

Fanduel seems to be my biggest issue which is the link that I posted.

**Larz60+** · Mar-25-2020, 07:27 PM

I am clicking inside selenium. You may nit see a result from me until later today or possibly even tomorrow.
to click, add following to your code:

from selenium.webdriver.common.by import By

...
self.browser.find_element(By.XPATH, '/html/body/div[1]/div/div/div[5]/main/div/div/div/div/div[1]/div[3]/div[2]/div/div/div[4]/div[10]/div/h4').click()

Khuber79 · (This post was last modified: Mar-25-2020, 09:12 PM by Khuber79.)

That worked, thank you for pointing me the in the right direction. The problem will arise if there are 30 different offerings on a game the Half-Time Result might not be the 9th offering as in this example it could be the 20th. I tried using the By.LINK_TEXT but couldn't get it to work. I'm assuming because it is not technically a link.

FD_web[z].find_element_by_xpath("//h4[@class='market-name-title' and text()='Half-Time Result']").click()

Got it working. Thank you for your help.

**Larz60+** · Mar-26-2020, 03:44 AM

I often use selenium to expand all of the JavaScript, then switch to Beautifulsoup
then you can use find, or find_all.
example:

        browser.get(self.mapurl)
        time.sleep(2)
        source = browser.page_source
        soup = BeautifulSoup(source, 'lxml')

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Webscrapping sport betting websites	KoinKoin	3	5,474	Nov-08-2023, 03:00 PM Last Post: LoriBrown
	Issue facing while scraping the data from different websites in single script.	Balamani	1	2,130	Oct-20-2020, 09:56 AM Last Post: Larz60+
	Can urlopen be blocked by websites?	peterjv26	2	3,406	Jul-26-2020, 06:45 PM Last Post: peterjv26
	Python program to write into websites for you	pythonDEV333	3	2,529	Jun-08-2020, 12:06 PM Last Post: pythonDEV333
	Scraping Websites to post on Telegram	kobryan	1	2,660	Oct-19-2019, 07:03 AM Last Post: metulburr
	Scraping Websites to post on Telegram	kobryan	0	3,436	Oct-09-2019, 04:11 PM Last Post: kobryan
	Scrapping .aspx websites	boxingowl88	3	8,267	Oct-10-2018, 05:35 PM Last Post: stranac
	Scrapper for websites	stinger	0	2,380	Jul-20-2018, 02:11 AM Last Post: stinger
	scraping javascript websites with selenium	DoctorEvil	1	3,379	Jun-08-2018, 06:40 PM Last Post: DoctorEvil
	Email extraction from websites	stefanoste78	14	12,145	Aug-18-2017, 09:44 PM Last Post: stefanoste78

Web Scraping Sportsbook Websites

User Panel Messages

Announcements