Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Code Help
#1
Hi,
I am trying to scrape some data off a website and then export that data into an excel file (xlsx), I am struggling to find the file or it either isn't writing to the file.
There's also another issue with, that the value of n resets per page of the website that it goes through.

import requests
import os
from bs4 import BeautifulSoup as BS
import xlsxwriter
page_number=0
page_no=str(page_number)
workbook.write.xlsx(df,"C:\Users\lukem\Desktop\datacollection", col_names = TRUE)
workbook = xlsxwriter.Workbook('data_collection')
worksheet = workbook.add_worksheet()
n = 1
car_entry=[]

URL = ('https://www.autotrader.co.uk/car-search?advertClassification=standard&postcode=la94py&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New&advertising-location=at_cars&is-quick-search=TRUE&include-delivery-option=on&page=')
agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}

while page_number < 5:
    n=n
    page_number = page_number+1 
    page=requests.get(URL + page_no, headers=agent)
    print(page_number)
    soup =BS(page.text,'html.parser')

    car_elements = soup.find_all('div', class_='product-card-content__car-info')

    for element in car_elements:
        m = 0 
        age_element = soup.find('div', class_='product-card-pricing__price')
        age_t = age_element.text
        worksheet.write(n,m,age_t)
        m=m+1
        name_element = soup.find('h3', class_='product-card-details__title')
        name_t = name_element.text
        worksheet.write(n,m,name_t)
        m=m+1
        key_spec_elements = soup.find('ul', class_='listing-key-specs')
        spec_list = key_spec_elements.text.split()
        b=1
        while b < len(spec_list):
            worksheet.write(n,m,spec_list[b])
            b = b+1

    car_entry.append([n,((spec_list,age_element,name_element))])
    print(car_entry)
    n = n + 1
    n=n 
    workbook.close()

#car_entry= [n, (spec_list,age_el6ement,name_element)
buran write Apr-14-2021, 06:16 PM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.
Reply
#2
(Apr-14-2021, 05:15 PM)luke_m Wrote: I am struggling to find the file or it either isn't writing to the file.
There's also another issue with, that the value of n resets per page of the website that it goes through.
You should take step back and not do a loop or write to file(my file 👀 Confused) before all work on single page test.
can give a example i use pandas when get data in a Dataframe it would look like excel.
When have the look wanted then just use df.to_excel().
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.autotrader.co.uk/car-search?advertClassification=standard&postcode=la94py&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New&advertising-location=at_cars&is-quick-search=TRUE&include-delivery-option=on&page=1'
agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
response = requests.get(url, headers=agent)
soup = BeautifulSoup(response.content, 'lxml')
car_elements = soup.find_all('div', class_='product-card-content__car-info')

price_lst = []
for tag in car_elements:
    price = tag.find('div', class_='product-card-pricing__price')
    price_lst.append(price.text.strip())
car_detail = []
for tag in car_elements:
    car = tag.find('h3', class_='product-card-details__title')
    car_detail.append(car.text.strip())

all_car = zip(car_detail, price_lst)
# Create the pandas DataFrame
df = pd.DataFrame(all_car, columns=['Name', 'price'])
df.to_excel("car_info.xlsx", index=False, sheet_name='car_info')
Output:
>>> df Name price 0 Vauxhall Astra £6,495 1 Vauxhall Insignia £295 2 MINI Hatch £6,800 3 BMW 1 SERIES £1,000 4 Mazda Eunos £1,200 5 MINI Hatch Cooper £1,995 6 Lexus GS 300 £570 7 Renault Megane £1,695 8 Volvo C70 £600 9 Vauxhall Corsa £2,195 10 Rover 25 £620 11 Vauxhall Adam £3,795 12 Vauxhall Astra £6,495
[Image: okNobQ.png]
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020