Code Help - Printable Version

Code Help - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Code Help (/thread-33310.html)

Code Help - luke_m - Apr-14-2021

Hi,
I am trying to scrape some data off a website and then export that data into an excel file (xlsx), I am struggling to find the file or it either isn't writing to the file.
There's also another issue with, that the value of n resets per page of the website that it goes through.

import requests
import os
from bs4 import BeautifulSoup as BS
import xlsxwriter
page_number=0
page_no=str(page_number)
workbook.write.xlsx(df,"C:\Users\lukem\Desktop\datacollection", col_names = TRUE)
workbook = xlsxwriter.Workbook('data_collection')
worksheet = workbook.add_worksheet()
n = 1
car_entry=[]

URL = ('https://www.autotrader.co.uk/car-search?advertClassification=standard&postcode=la94py&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New&advertising-location=at_cars&is-quick-search=TRUE&include-delivery-option=on&page=')
agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}

while page_number < 5:
    n=n
    page_number = page_number+1 
    page=requests.get(URL + page_no, headers=agent)
    print(page_number)
    soup =BS(page.text,'html.parser')

    car_elements = soup.find_all('div', class_='product-card-content__car-info')

    for element in car_elements:
        m = 0 
        age_element = soup.find('div', class_='product-card-pricing__price')
        age_t = age_element.text
        worksheet.write(n,m,age_t)
        m=m+1
        name_element = soup.find('h3', class_='product-card-details__title')
        name_t = name_element.text
        worksheet.write(n,m,name_t)
        m=m+1
        key_spec_elements = soup.find('ul', class_='listing-key-specs')
        spec_list = key_spec_elements.text.split()
        b=1
        while b < len(spec_list):
            worksheet.write(n,m,spec_list[b])
            b = b+1

    car_entry.append([n,((spec_list,age_element,name_element))])
    print(car_entry)
    n = n + 1
    n=n 
    workbook.close()

#car_entry= [n, (spec_list,age_el6ement,name_element)

RE: Code Help - snippsat - Apr-15-2021

(Apr-14-2021, 05:15 PM)luke_m Wrote: I am struggling to find the file or it either isn't writing to the file.
There's also another issue with, that the value of n resets per page of the website that it goes through.

You should take step back and not do a loop or write to file(my file 👀 Confused

) before all work on single page test.
can give a example i use pandas when get data in a Dataframe it would look like excel.
When have the look wanted then just use df.to_excel().

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.autotrader.co.uk/car-search?advertClassification=standard&postcode=la94py&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New&advertising-location=at_cars&is-quick-search=TRUE&include-delivery-option=on&page=1'
agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
response = requests.get(url, headers=agent)
soup = BeautifulSoup(response.content, 'lxml')
car_elements = soup.find_all('div', class_='product-card-content__car-info')

price_lst = []
for tag in car_elements:
    price = tag.find('div', class_='product-card-pricing__price')
    price_lst.append(price.text.strip())
car_detail = []
for tag in car_elements:
    car = tag.find('h3', class_='product-card-details__title')
    car_detail.append(car.text.strip())

all_car = zip(car_detail, price_lst)
# Create the pandas DataFrame
df = pd.DataFrame(all_car, columns=['Name', 'price'])
df.to_excel("car_info.xlsx", index=False, sheet_name='car_info')

Output:>>> df
                 Name   price
0      Vauxhall Astra  £6,495
1   Vauxhall Insignia    £295
2          MINI Hatch  £6,800
3        BMW 1 SERIES  £1,000
4         Mazda Eunos  £1,200
5   MINI Hatch Cooper  £1,995
6        Lexus GS 300    £570
7      Renault Megane  £1,695
8           Volvo C70    £600
9      Vauxhall Corsa  £2,195
10           Rover 25    £620
11      Vauxhall Adam  £3,795
12     Vauxhall Astra  £6,495