Python Forum
Simple screen scrape is baffling me.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Simple screen scrape is baffling me.
#1
Greetings,

I am just learning to screen scrape using beautiful soup. I have been doing well so far, but thought I would make a simple script that would get the daily numbers from a lottery website and write the numbers to a file. This is a simple script but for the life of me I am unable to scrape the numbers. Below is my code, if anyone could give some suggestions on how to get the data (the date and the numbers) I would greatly appreciate it.


from bs4 import BeautifulSoup
import requests

url = 'https://www.palottery.state.pa.us/Draw-Games/Treasure-Hunt.aspx'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
details_div = soup.find('div', {'class': 'details'})
balls = details_div.find_all('span', {'class': 'ball'})
for ball in balls:
   print(ball.text.strip())
Reply
#2
This common problem(when do scraping) that is what you want is generated bye JavaScript.
Then will not common scraping methods as you do here work.

Maybe not so easy if new is looking for Json response and get data from there.
import requests

response = requests.get('https://www.palottery.state.pa.us/Custom/feeds/DrawingsData.aspx?game=treasure%20hunt')
js = response.json()[10]
lotto_number = js['DrawingNumbersAsList']
>>> lotto_number
['01', '12', '13', '14', '25']
Other way is to use Selenium or Playwright.
Example with Selenium.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time

#--| Setup
options = Options()
#options.add_argument("--headless")
ser = Service(r"C:\cmder\bin\chromedriver.exe")
browser = webdriver.Chrome(service=ser, options=options)
#--| Parse or automation
url = 'https://www.palottery.state.pa.us/Draw-Games/Treasure-Hunt.aspx'
browser.get(url)
time.sleep(2)
lotto_number = browser.find_element(By.CSS_SELECTOR, 'div.details')
>>> lotto_number.text
'0112131425'
# Have to do some splitting up
>>> lotto_split = zip(*[lotto_number.text[i::2] for i in range(2)])
>>> lotto_split
<zip object at 0x000001D7993D6E00>
>>> list(''.join(tpl) for tpl in lotto_split)
['01', '12', '13', '14', '25']
ibreeden likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 5,214 Mar-13-2020, 07:59 PM
Last Post: alkaline3

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020