Can't figure out how to scrape grid

templeowls · Jun-18-2024, 12:57 PM

(Jun-13-2024, 09:54 PM)Larz60+ Wrote: you will need to use a scraper that can recognize and click on the audit checkbox, then wait until new page loads prior to deownloaing the new page. Here are some links that will help:

how to locate elements
Click on Checkbox
pageLoadStrategy

Thanks! So I used those sources to create the below code. I'm getting a blank csv though. Not sure what I'm doing wrong.

import requests
from bs4 import BeautifulSoup
import csv

url = "https://oig.hhs.gov/reports-and-publications/all-reports-and-publications/"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

reports = soup.find_all("div", class_="media")

report_data = []

for report in reports:
    title = report.find("h3").get_text(strip=True)
    audit = report.find("span", class_="audit").get_text(strip=True) if report.find("span", class_="audit") else "N/A"
    agency = report.find("span", class_="agency").get_text(strip=True) if report.find("span", class_="agency") else "N/A"
    date = report.find("span", class_="date").get_text(strip=True) if report.find("span", class_="date") else "N/A"
    
    report_data.append({
        "Title": title,
        "Audit": audit,
        "Agency": agency,
        "Date": date
    })

# Export to CSV
csv_file = "reports_data.csv"
with open(csv_file, mode='w', newline='', encoding='utf-8') as file:
    writer = csv.DictWriter(file, fieldnames=["Title", "Audit", "Agency", "Date"])
    writer.writeheader()
    for data in report_data:
        writer.writerow(data)

print(f"Data exported to {csv_file}")

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	scrape data 1 go to next page scrape data 2 and so on	alkaline3	6	5,606	Mar-13-2020, 07:59 PM Last Post: alkaline3

Can't figure out how to scrape grid

User Panel Messages

Announcements