Python Forum
Need help opening pages when web scraping
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need help opening pages when web scraping
#1
I have the below code which scrapes this page: https://www.eeoc.gov/newsroom/search.

It works well but I also want it to open each url and scrape the full text on the page for each. Any suggestions on how to modify this code to achieve?

import csv
import requests
from bs4 import BeautifulSoup

def scrape_eec_news():
    base_url = "https://www.eeoc.gov/newsroom/search?page="
    results = []
    page_number = 0
    
    while True:
        page_number += 1
        url = base_url + str(page_number)
        response = requests.get(url)
        response.raise_for_status()

        soup = BeautifulSoup(response.content, "html.parser")
        entries = soup.find_all("div", class_="views-row")
        if not entries:
            break

        print("Scraping page", page_number)  # Print the page number

        for entry in entries:
            title_elem = entry.h2
            description_elem = entry.p
            date_elem = entry.find("div", class_="field--type-datetime")
            url_elem = entry.a

            title = title_elem.text.strip()
            description = description_elem.text.strip() if description_elem else ""
            date = date_elem.text.strip() if date_elem else ""  # Check if date_elem is not None
            url = url_elem["href"]
            # Add the 'agency' column with the value "United States Equal Employment Opportunity Commission"
            results.append(
                {
                    "title": title,
                    "description": description,
                    "date": date,
                    "url": url,
                    "agency": "United States Equal Employment Opportunity Commission"
                }
            )
    
    return results

def export_to_csv(data, filename):
    with open(filename, "w", newline="", encoding="utf-8") as csvfile:
        fieldnames = ["title", "description", "date", "url", "agency"]  # Include 'agency' in the fieldnames
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        for entry in data:
            writer.writerow(entry)

if __name__ == "__main__":
    news_entries = scrape_eec_news()
    export_to_csv(news_entries, "eec_news.csv")
    print("Data exported to eec_news.csv")
Reply


Messages In This Thread
Need help opening pages when web scraping - by templeowls - Feb-26-2024, 08:16 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  opening your scraping with a csv file in excel?? Nobelium 0 1,455 Jan-27-2021, 02:31 PM
Last Post: Nobelium
  scraping multiple pages from table bandar 1 2,734 Jun-27-2020, 10:43 PM
Last Post: Larz60+
  Scraping Multiple Pages mbadatanut 1 4,254 May-08-2020, 02:30 AM
Last Post: Larz60+
  Scraping not moving to the next pages in a website jithin123 0 1,993 Mar-23-2020, 06:10 PM
Last Post: jithin123
  Web Page not opening while web scraping through python selenium sumandas89 4 10,166 Nov-19-2018, 02:47 PM
Last Post: snippsat
  Scraping external URLs from pages Apook 5 4,234 Jul-18-2018, 06:42 PM
Last Post: nilamo
  scraping multiple pages of a website. Blue Dog 14 22,523 Jun-21-2018, 09:03 PM
Last Post: Blue Dog

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020