Simple screen scrape is baffling me.

monty024 · Apr-26-2023, 02:04 AM

Greetings,

I am just learning to screen scrape using beautiful soup. I have been doing well so far, but thought I would make a simple script that would get the daily numbers from a lottery website and write the numbers to a file. This is a simple script but for the life of me I am unable to scrape the numbers. Below is my code, if anyone could give some suggestions on how to get the data (the date and the numbers) I would greatly appreciate it.

from bs4 import BeautifulSoup
import requests

url = 'https://www.palottery.state.pa.us/Draw-Games/Treasure-Hunt.aspx'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
details_div = soup.find('div', {'class': 'details'})
balls = details_div.find_all('span', {'class': 'ball'})
for ball in balls:
   print(ball.text.strip())

***snippsat*** · (This post was last modified: Apr-26-2023, 03:27 PM by snippsat.)

This common problem(when do scraping) that is what you want is generated bye JavaScript.
Then will not common scraping methods as you do here work.

Maybe not so easy if new is looking for Json response and get data from there.

import requests

response = requests.get('https://www.palottery.state.pa.us/Custom/feeds/DrawingsData.aspx?game=treasure%20hunt')
js = response.json()[10]
lotto_number = js['DrawingNumbersAsList']

>>> lotto_number
['01', '12', '13', '14', '25']

Other way is to use Selenium or Playwright.
Example with Selenium.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time

#--| Setup
options = Options()
#options.add_argument("--headless")
ser = Service(r"C:\cmder\bin\chromedriver.exe")
browser = webdriver.Chrome(service=ser, options=options)
#--| Parse or automation
url = 'https://www.palottery.state.pa.us/Draw-Games/Treasure-Hunt.aspx'
browser.get(url)
time.sleep(2)
lotto_number = browser.find_element(By.CSS_SELECTOR, 'div.details')

>>> lotto_number.text
'0112131425'
# Have to do some splitting up
>>> lotto_split = zip(*[lotto_number.text[i::2] for i in range(2)])
>>> lotto_split
<zip object at 0x000001D7993D6E00>
>>> list(''.join(tpl) for tpl in lotto_split)
['01', '12', '13', '14', '25']

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	scrape data 1 go to next page scrape data 2 and so on	alkaline3	6	5,214	Mar-13-2020, 07:59 PM Last Post: alkaline3

Simple screen scrape is baffling me.

User Panel Messages

Announcements