![]() |
web scraping IMDb database - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: web scraping IMDb database (/thread-42189.html) |
web scraping IMDb database - guidoguidi - May-24-2024 I am trying to extract information from an IMDb webpage. In particular the names and the description of the 50 actors. This is my code and the error that it gives to me is : 'NoneType' object has no attribute 'text'. import requests from bs4 import BeautifulSoup import pandas as pd import time # Define the URL url = 'https://www.imdb.com/list/ls053501318/' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'Accept-Encoding': 'none', 'Accept-Language': 'en-US,en;q=0.8', 'Connection': 'keep-alive' } # Send a GET request to the URL response = requests.get(url, headers=headers) # Parse the HTML content using BeautifulSoup soup = BeautifulSoup(response.content, 'html.parser') # Find the list items containing actor information actor_items = soup.find_all('li', class_='ipc-metadata-list-summary-item') # Extract details for each actor actors = [] for item in actor_items: # Ensure the element for name exists name_element = item.find('span', class_='ipc-title__text') name = name_element.text.strip() # Ensure the element for details exists details_element = item.find('span', class_='ipc-html-content-inner-div') details = details_element.text.strip() actors.append({ 'name': name, 'details': details }) # Add a delay to avoid overwhelming the server time.sleep(1) RE: web scraping IMDb database - Larz60+ - May-24-2024 without digging deep into your code, I noticed one thing: line 20: actor_items = soup.find_all('li', class_='ipc-metadata-list-summary-item') should read: actor_items = soup.find_all('li', {class: "ipc-metadata-list-summary-item''})
RE: web scraping IMDb database - deanhystad - May-24-2024 There was a lot of action on this a while ago. Scraping from IMDB is not easy. https://python-forum.io/thread-40671.html?highlight=imdb https://python-forum.io/thread-40687.html?highlight=imdb |