Python Forum
Can't figure out how to scrape grid
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Can't figure out how to scrape grid
#1
I have the below code to scrape this site (https://oig.hhs.gov/reports-and-publicat...lications/) and extract to a CSV.

It works well to scrape each title and URL. I want it to also scrape the content under "audit", "HHS agency" and "date" for each title, but I can't seem to code it right given all three elements are in a grid.

Any suggestions? Thanks

import requests
from bs4 import BeautifulSoup
import csv
from urllib.parse import urljoin

base_url = "https://oig.hhs.gov"
url = "https://oig.hhs.gov/reports-and-publications/all-reports-and-publications/?page={}"

titles_with_urls = []
page = 1
max_pages = 100

while page <= max_pages:
    print(f"Scraping page {page}...")
    response = requests.get(url.format(page))
    soup = BeautifulSoup(response.content, "html.parser")

    # Check if there are titles on the page
    page_titles = soup.select("h2 a")
    if not page_titles:
        print("No more pages to scrape. Exiting...")
        break  # Exit the loop if no titles are found, assuming no more pages

    for title in page_titles:
        title_text = title.text.strip()
        url_link = title.get('href')  # Get the URL link from the 'href' attribute

        # Ensure the URL starts with base_url if it's a relative URL
        if url_link.startswith("/"):
            full_url = urljoin(base_url, url_link)
        else:
            full_url = url_link

        titles_with_urls.append([title_text, full_url])
        print(f"Scraped title: {title_text}, URL: {full_url}")

    page += 1

# Write titles and URLs to a CSV file
with open('titles_with_urls.csv', mode='w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(["Title", "URL"])  # Write header
    writer.writerows(titles_with_urls)

print("All titles and URLs have been scraped and saved to titles_with_urls.csv.")
Reply


Messages In This Thread
Can't figure out how to scrape grid - by templeowls - Jun-13-2024, 05:14 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 5,601 Mar-13-2020, 07:59 PM
Last Post: alkaline3

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020