Python Forum
web scraping IMDb database
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
web scraping IMDb database
#1
I am trying to extract information from an IMDb webpage. In particular the names and the description of the 50 actors.
This is my code and the error that it gives to me is : 'NoneType' object has no attribute 'text'.

import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
# Define the URL
url = 'https://www.imdb.com/list/ls053501318/'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
    'Accept-Encoding': 'none',
    'Accept-Language': 'en-US,en;q=0.8',
    'Connection': 'keep-alive'
    }
# Send a GET request to the URL
response = requests.get(url, headers=headers)
# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
# Find the list items containing actor information
actor_items = soup.find_all('li', class_='ipc-metadata-list-summary-item')
# Extract details for each actor
actors = []
for item in actor_items:
    # Ensure the element for name exists
    name_element = item.find('span', class_='ipc-title__text')
    name = name_element.text.strip() 
    
    # Ensure the element for details exists
    details_element = item.find('span', class_='ipc-html-content-inner-div')
    details = details_element.text.strip()
    
    actors.append({
        'name': name,
        'details': details
    })

 # Add a delay to avoid overwhelming the server
time.sleep(1)
Reply
#2
without digging deep into your code, I noticed one thing:
line 20:
actor_items = soup.find_all('li', class_='ipc-metadata-list-summary-item')
should read:
actor_items = soup.find_all('li', {class: "ipc-metadata-list-summary-item''})
Reply
#3
There was a lot of action on this a while ago. Scraping from IMDB is not easy.

https://python-forum.io/thread-40671.htm...light=imdb
https://python-forum.io/thread-40687.htm...light=imdb
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020