Python Forum
Code Help, web scraping non uniform lists(ul)
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Code Help, web scraping non uniform lists(ul)
#1
Hi,

I am writing this code for scraping off of a website into an excel spreadsheet, i am having an issue where the website doesn't use a list of the same length and so it means that I get an attribute error for the find_next function, wondering if anyone knows of a workaround.
My coding is a bit of mess

import requests
from bs4 import BeautifulSoup
import pandas as pd
page_number = 1

url = 'https://www.autotrader.co.uk/car-search?advertClassification=standard&postcode=la94py&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New&advertising-location=at_cars&is-quick-search=TRUE&include-delivery-option=on&page='

agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
car_spec = []
car_age = []
car_style = []
car_mileage =[]
car_engine_size =[]
car_BHP = []
price_lst = []
car_detail = []
car_gearbox_style = []
car_fuel_type = []
car_next=[]
while page_number < 100:
    all_car = []
    page_number += 1
    pg_no = str(page_number)
    print(page_number)
    url2= url+pg_no
    response = requests.get(url2, headers=agent)
    soup = BeautifulSoup(response.content, 'lxml')
    car_elements = soup.find_all('div', class_='product-card-content__car-info')
    for tag in car_elements:
        price = tag.find('div', class_='product-card-pricing__price')
        price_lst.append(price.text.strip())
    for tag in car_elements:
        car = tag.find('h3', class_='product-card-details__title')
        car_detail.append(car.text.strip())
    for tag in car_elements:
        car = tag.find('li', class_='atc-type-picanto--medium')
        if car is None:
            car='0'
            car_age.append(car)
        else:
            car_age.append(car.text)
        car = tag.find('li', class_='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium')
        if car is None:
            car='0'
            car_style.append(car)
        else:
            car_style.append(car.text)
        car = tag.find('li', class_ ='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium')
        if car is None:
            car='0'
            car_mileage.append(car)
        else:
            car_mileage.append(car.text)
        car = tag.find('li',class_='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium')
        if car is None:
            car='0'
            car_engine_size.append(car)
        else:
            car_engine_size.append(car.text)
        car= tag.find('li', class_ ='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium')
        if car is None:
            car='0'
            car_gearbox_style.append(car)
        else:
            car_gearbox_style.append(car.text)
        car = tag.find('li', class_ ='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium').find_next('li', class_='atc-type-picanto--medium')
        if car is None:
            car='0'
            car_fuel_type.append(car)
        else:
            car_fuel_type.append(car.text)

    
    all_car = zip(car_detail, price_lst, car_age,car_style,car_mileage,car_engine_size,car_gearbox_style,car_fuel_type)

    
# Create the pandas DataFrame
df = pd.DataFrame(all_car)
df.to_excel("car_info.xlsx", index=False, sheet_name='car_info')
Reply


Messages In This Thread
Code Help, web scraping non uniform lists(ul) - by luke_m - Apr-21-2021, 04:38 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  scraping code misses listings kolarmi19 0 1,076 Jan-27-2023, 10:00 AM
Last Post: kolarmi19
  scraping code nexuz89 0 1,527 Sep-28-2020, 12:16 PM
Last Post: nexuz89
  In need of web scraping code! kolbyng 1 1,754 Sep-21-2020, 06:02 AM
Last Post: buran
  error in code web scraping alexisbrunaux 5 3,836 Aug-19-2020, 02:31 AM
Last Post: alexisbrunaux
  scraping from a website that hides source code PIWI_Protein 1 1,991 Mar-27-2020, 05:08 PM
Last Post: Larz60+
  Web Scraping, Merging two lists and getting data from various dates? AgileAVS 0 1,870 Feb-07-2020, 04:05 PM
Last Post: AgileAVS

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020