Python Forum

Full Version: Simple screen scrape is baffling me.
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Greetings,

I am just learning to screen scrape using beautiful soup. I have been doing well so far, but thought I would make a simple script that would get the daily numbers from a lottery website and write the numbers to a file. This is a simple script but for the life of me I am unable to scrape the numbers. Below is my code, if anyone could give some suggestions on how to get the data (the date and the numbers) I would greatly appreciate it.


from bs4 import BeautifulSoup
import requests

url = 'https://www.palottery.state.pa.us/Draw-Games/Treasure-Hunt.aspx'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
details_div = soup.find('div', {'class': 'details'})
balls = details_div.find_all('span', {'class': 'ball'})
for ball in balls:
   print(ball.text.strip())
This common problem(when do scraping) that is what you want is generated bye JavaScript.
Then will not common scraping methods as you do here work.

Maybe not so easy if new is looking for Json response and get data from there.
import requests

response = requests.get('https://www.palottery.state.pa.us/Custom/feeds/DrawingsData.aspx?game=treasure%20hunt')
js = response.json()[10]
lotto_number = js['DrawingNumbersAsList']
>>> lotto_number
['01', '12', '13', '14', '25']
Other way is to use Selenium or Playwright.
Example with Selenium.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time

#--| Setup
options = Options()
#options.add_argument("--headless")
ser = Service(r"C:\cmder\bin\chromedriver.exe")
browser = webdriver.Chrome(service=ser, options=options)
#--| Parse or automation
url = 'https://www.palottery.state.pa.us/Draw-Games/Treasure-Hunt.aspx'
browser.get(url)
time.sleep(2)
lotto_number = browser.find_element(By.CSS_SELECTOR, 'div.details')
>>> lotto_number.text
'0112131425'
# Have to do some splitting up
>>> lotto_split = zip(*[lotto_number.text[i::2] for i in range(2)])
>>> lotto_split
<zip object at 0x000001D7993D6E00>
>>> list(''.join(tpl) for tpl in lotto_split)
['01', '12', '13', '14', '25']