Jun-18-2024, 12:57 PM
(Jun-13-2024, 09:54 PM)Larz60+ Wrote: you will need to use a scraper that can recognize and click on the audit checkbox, then wait until new page loads prior to deownloaing the new page. Here are some links that will help:
how to locate elements
Click on Checkbox
pageLoadStrategy
Thanks! So I used those sources to create the below code. I'm getting a blank csv though. Not sure what I'm doing wrong.
import requests from bs4 import BeautifulSoup import csv url = "https://oig.hhs.gov/reports-and-publications/all-reports-and-publications/" response = requests.get(url) soup = BeautifulSoup(response.content, "html.parser") reports = soup.find_all("div", class_="media") report_data = [] for report in reports: title = report.find("h3").get_text(strip=True) audit = report.find("span", class_="audit").get_text(strip=True) if report.find("span", class_="audit") else "N/A" agency = report.find("span", class_="agency").get_text(strip=True) if report.find("span", class_="agency") else "N/A" date = report.find("span", class_="date").get_text(strip=True) if report.find("span", class_="date") else "N/A" report_data.append({ "Title": title, "Audit": audit, "Agency": agency, "Date": date }) # Export to CSV csv_file = "reports_data.csv" with open(csv_file, mode='w', newline='', encoding='utf-8') as file: writer = csv.DictWriter(file, fieldnames=["Title", "Audit", "Agency", "Date"]) writer.writeheader() for data in report_data: writer.writerow(data) print(f"Data exported to {csv_file}")