web scraping IMDb database - Printable Version

web scraping IMDb database - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: web scraping IMDb database (/thread-42189.html)

web scraping IMDb database - guidoguidi - May-24-2024

I am trying to extract information from an IMDb webpage. In particular the names and the description of the 50 actors.
This is my code and the error that it gives to me is : 'NoneType' object has no attribute 'text'.

import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
# Define the URL
url = 'https://www.imdb.com/list/ls053501318/'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
    'Accept-Encoding': 'none',
    'Accept-Language': 'en-US,en;q=0.8',
    'Connection': 'keep-alive'
    }
# Send a GET request to the URL
response = requests.get(url, headers=headers)
# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
# Find the list items containing actor information
actor_items = soup.find_all('li', class_='ipc-metadata-list-summary-item')
# Extract details for each actor
actors = []
for item in actor_items:
    # Ensure the element for name exists
    name_element = item.find('span', class_='ipc-title__text')
    name = name_element.text.strip() 
    
    # Ensure the element for details exists
    details_element = item.find('span', class_='ipc-html-content-inner-div')
    details = details_element.text.strip()
    
    actors.append({
        'name': name,
        'details': details
    })

 # Add a delay to avoid overwhelming the server
time.sleep(1)

RE: web scraping IMDb database - Larz60+ - May-24-2024

without digging deep into your code, I noticed one thing:
line 20:
actor_items = soup.find_all('li', class_='ipc-metadata-list-summary-item')
should read:
actor_items = soup.find_all('li', {class: "ipc-metadata-list-summary-item''})

RE: web scraping IMDb database - deanhystad - May-24-2024

There was a lot of action on this a while ago. Scraping from IMDB is not easy.

https://python-forum.io/thread-40671.html?highlight=imdb
https://python-forum.io/thread-40687.html?highlight=imdb