May-24-2024, 04:44 PM
(This post was last modified: May-24-2024, 04:44 PM by guidoguidi.)
I am trying to extract information from an IMDb webpage. In particular the names and the description of the 50 actors.
This is my code and the error that it gives to me is : 'NoneType' object has no attribute 'text'.
This is my code and the error that it gives to me is : 'NoneType' object has no attribute 'text'.
import requests from bs4 import BeautifulSoup import pandas as pd import time # Define the URL url = 'https://www.imdb.com/list/ls053501318/' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'Accept-Encoding': 'none', 'Accept-Language': 'en-US,en;q=0.8', 'Connection': 'keep-alive' } # Send a GET request to the URL response = requests.get(url, headers=headers) # Parse the HTML content using BeautifulSoup soup = BeautifulSoup(response.content, 'html.parser') # Find the list items containing actor information actor_items = soup.find_all('li', class_='ipc-metadata-list-summary-item') # Extract details for each actor actors = [] for item in actor_items: # Ensure the element for name exists name_element = item.find('span', class_='ipc-title__text') name = name_element.text.strip() # Ensure the element for details exists details_element = item.find('span', class_='ipc-html-content-inner-div') details = details_element.text.strip() actors.append({ 'name': name, 'details': details }) # Add a delay to avoid overwhelming the server time.sleep(1)