Web scraping confusion

keifin · (This post was last modified: Feb-16-2025, 03:34 PM by Larz60+.)

Hello Everyone,

I am new to Python and created a simple web scraper that would scrape the winning pick 3 number from the Florida lottery website (I find web scraping interesting and wanted to try it out on a simple page first). It was working fine until the Florida lottery revamped the website. I would like to scrape the winning pick 3 numbers from the new site link removed however when I look at the HTML source code the numbers are not in the source. Opening the developer's tools under "Sources" I see the document "pick-3" inside the "games/draw-games" folder and in this document are the numbers. How would I access the data in this document so that I can scrape the winning numbers? I've been reading where I can use selenium or json, however no matter what I try I keep running in to errors. I am running python from a raspberry pi on bookworm (version 12).

Any help or guidance would be greatly appreciated. Confused

Keifn

Larz60+ write Feb-16-2025, 03:34 PM:
clickbait link removed

vercel · (This post was last modified: Mar-06-2025, 08:54 PM by buran.)

It looks like the Florida Lottery website now loads the Pick 3 winning numbers dynamically using JavaScript, meaning they aren’t present in the initial HTML source. To scrape this data, first check the Network tab in Developer Tools (F12) and filter by XHR or Fetch to find an API request that returns the numbers in JSON format, which you can fetch using Python’s requests library. If the data is rendered dynamically, Selenium can be used to automate the browser and extract the numbers. Additionally, inspect JavaScript files under the Sources tab, as they might contain the required data in a JSON-like structure. If you share the exact URL, I can guide you more specifically on extracting the numbers.

buran write Mar-06-2025, 08:54 PM:
clickbait link removed

cspengel · (This post was last modified: Apr-07-2025, 08:17 AM by buran.)

Try this:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
import time
from bs4 import BeautifulSoup
import logging
import os 

def scrape_pick3_numbers():
    url = "https://www.flalottery.com/games/draw-games/pick-3"
    
    try:
        # Suppress ChromeDriver logging
        logging.getLogger('selenium').setLevel(logging.WARNING)
        chrome_options = Options()
        chrome_options.add_argument("--headless")
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument("--disable-dev-shm-usage")
        chrome_options.add_argument("--log-level=3")  
        
        # Set up service to control logging
        service = Service()
        service.log_path = 'nul' if os.name == 'nt' else '/dev/null'  # Windows: 'nul', Unix: '/dev/null'
        
        driver = webdriver.Chrome(service=service, options=chrome_options)
        driver.get(url)
        time.sleep(3)
        
        html = driver.page_source
        driver.quit()
        
        soup = BeautifulSoup(html, 'html.parser')
        
        draw_dates = soup.find_all('p', class_='draw-date--pick3')
        if not draw_dates:
            print("No draw-date--pick3 elements found in HTML.")
            return None
        
        results = []
        
        for section in draw_dates:
            svg_icon = section.find('svg')
            draw_time = "Midday" if svg_icon and "sun" in svg_icon.get('data-icon', '') else "Evening"
            
            date_text = section.text.strip()
            date_text = date_text.replace("Midday", "").replace("Evening", "").strip()
            if date_text and ', ' in date_text:
                date_text = date_text.split(', ', 1)[1]
            elif not date_text:
                date_text = "Unknown"
            
            number_list = section.find_next('ul', class_='game-numbers--pick3')
            if number_list:
                numbers = [li.find('span').text for li in number_list.find_all('li', class_='game-numbers__number') if li.find('span')]
                fireball = number_list.find('span', class_='game-numbers__bonus-text')
                fireball = fireball.text if fireball else "N/A"
                
                results.append({
                    'draw_time': draw_time,
                    'date': date_text,
                    'numbers': numbers,
                    'fireball': fireball
                })
        
        if not results:
            print("No complete results extracted.")
            return None
        
        latest_results = {}
        for result in results:
            if result['draw_time'] not in latest_results:
                latest_results[result['draw_time']] = result
        
        return list(latest_results.values())
    
    except Exception as e:
        print(f"Error during scraping: {e}")
        return None

def print_results(results):
    if results:
        for data in results:
            print(f"\n{data['draw_time']} Drawing - {data['date']}:")
            print(f"Winning Numbers: {'-'.join(data['numbers'])}")
            print(f"Fireball: {data['fireball']}")
    else:
        print("No results found.")

if __name__ == "__main__":
    
    results = scrape_pick3_numbers()
    print_results(results)

buran write Apr-07-2025, 08:17 AM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.

Web scraping confusion

User Panel Messages

Announcements