Code Help - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Code Help (/thread-33310.html) |
Code Help - luke_m - Apr-14-2021 Hi, I am trying to scrape some data off a website and then export that data into an excel file (xlsx), I am struggling to find the file or it either isn't writing to the file. There's also another issue with, that the value of n resets per page of the website that it goes through. import requests import os from bs4 import BeautifulSoup as BS import xlsxwriter page_number=0 page_no=str(page_number) workbook.write.xlsx(df,"C:\Users\lukem\Desktop\datacollection", col_names = TRUE) workbook = xlsxwriter.Workbook('data_collection') worksheet = workbook.add_worksheet() n = 1 car_entry=[] URL = ('https://www.autotrader.co.uk/car-search?advertClassification=standard&postcode=la94py&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New&advertising-location=at_cars&is-quick-search=TRUE&include-delivery-option=on&page=') agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'} while page_number < 5: n=n page_number = page_number+1 page=requests.get(URL + page_no, headers=agent) print(page_number) soup =BS(page.text,'html.parser') car_elements = soup.find_all('div', class_='product-card-content__car-info') for element in car_elements: m = 0 age_element = soup.find('div', class_='product-card-pricing__price') age_t = age_element.text worksheet.write(n,m,age_t) m=m+1 name_element = soup.find('h3', class_='product-card-details__title') name_t = name_element.text worksheet.write(n,m,name_t) m=m+1 key_spec_elements = soup.find('ul', class_='listing-key-specs') spec_list = key_spec_elements.text.split() b=1 while b < len(spec_list): worksheet.write(n,m,spec_list[b]) b = b+1 car_entry.append([n,((spec_list,age_element,name_element))]) print(car_entry) n = n + 1 n=n workbook.close() #car_entry= [n, (spec_list,age_el6ement,name_element) RE: Code Help - snippsat - Apr-15-2021 (Apr-14-2021, 05:15 PM)luke_m Wrote: I am struggling to find the file or it either isn't writing to the file.You should take step back and not do a loop or write to file(my file 👀 ) before all work on single page test. can give a example i use pandas when get data in a Dataframe it would look like excel.When have the look wanted then just use df.to_excel() .import requests from bs4 import BeautifulSoup import pandas as pd url = 'https://www.autotrader.co.uk/car-search?advertClassification=standard&postcode=la94py&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New&advertising-location=at_cars&is-quick-search=TRUE&include-delivery-option=on&page=1' agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'} response = requests.get(url, headers=agent) soup = BeautifulSoup(response.content, 'lxml') car_elements = soup.find_all('div', class_='product-card-content__car-info') price_lst = [] for tag in car_elements: price = tag.find('div', class_='product-card-pricing__price') price_lst.append(price.text.strip()) car_detail = [] for tag in car_elements: car = tag.find('h3', class_='product-card-details__title') car_detail.append(car.text.strip()) all_car = zip(car_detail, price_lst) # Create the pandas DataFrame df = pd.DataFrame(all_car, columns=['Name', 'price']) df.to_excel("car_info.xlsx", index=False, sheet_name='car_info')
|