Python Forum
Beginner help - Leap Year Issue Feb 29 and multiple pages
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Beginner help - Leap Year Issue Feb 29 and multiple pages
#1
Hello All,

I am an absolute beginner, below is the script I half wrote, half grabbed bits from multiple online sources.
Essentially getting name, price and dates of sold items on ebay.
I think i've sold the issue of making numeric of thousand numbers (ie: 4,000) but I can't seem to get an answer online about fixing the leap year issue in datetime.strptime, the error being 'ValueError: day is out of range for month'

Another issue is I don't think I'm using numpy right, in the 'pages = np.arange()', sometimes it scrapes info and then repeats, sometimes it scrapes X minus 2 bits of prices/products where I can see on the webpage it is just X (IE i see 69 but I get only 67 or 66). Any help with this section is also needed/appreciated please.

All help appreciated thanks.
import requests
from requests import get
from bs4 import BeautifulSoup
from urllib.request import urlretrieve
from urllib.parse import quote
import pandas as pd
import numpy as np
from datetime import datetime
from datetime import date
from time import sleep
from random import randint

#Initialize empty lists where you'll store your data
Product_name = []
Price = []
Date_sold = []

Search_name = input("Search for: ")
qstr = quote(Search_name)
Exclude_terms = input("Exclude these terms (- infront of all): ")
qstrr = quote(Exclude_terms)

pages = np.arange(1, 1000, 50)

for page in pages:

	page = requests.get("https://www.ebay.com.au/sch/i.html?_from=R40&_nkw=" + qstr + qstrr + "&_sacat=0&LH_TitleDesc=0&_fsrp=1&LH_Complete=1&rt=nc&LH_Sold=1&_pgn=" + str(page))
	
	soup = BeautifulSoup(page.text, 'html.parser')

	search = soup.find_all('div', class_='s-item__wrapper')

	sleep(randint(2,10))

	for container in search:
  
		#Name
		name = container.h3.text.strip()
		Product_name.append(name)

		#Price
		price = container.find('span', class_='s-item__price').text.strip() if container.find('span', class_='POSITIVE') else ''
		Price.append(price)

		#Date Sold
		sold = container.find('span', class_='s-item__ended-date').text
		soldd = datetime.strptime(sold, '%b-%d %H:%M')
		solddd = datetime.strftime(soldd, '%d-%b') 
		Date_sold.append(solddd)

#building our Pandas dataframe         
EBay_Products = pd.DataFrame({
'Product Name': Product_name,
'Price': Price,
'Sold Day' : Date_sold
})

EBay_Products['Price'] = EBay_Products['Price'].map(lambda x: x.lstrip('AU $'))
EBay_Products['Price'] = pd.to_numeric(EBay_Products['Price'].str.replace(',',''), errors='coerce')

EBay_Products.to_csv(Search_name + " scraped on " + str(date.today()) + '.csv')
Reply


Messages In This Thread
Beginner help - Leap Year Issue Feb 29 and multiple pages - by warriordazza - May-06-2020, 01:03 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Scrape table from multiple pages Nhattanktnn 1 920 Jun-07-2023, 09:35 AM
Last Post: Larz60+
  Web scrap multiple pages anilacem_302 3 3,917 Jul-01-2020, 07:50 PM
Last Post: mlieqo
  scraping multiple pages from table bandar 1 2,762 Jun-27-2020, 10:43 PM
Last Post: Larz60+
  Scraping Multiple Pages mbadatanut 1 4,289 May-08-2020, 02:30 AM
Last Post: Larz60+
  Looping through multiple pages with changing url Qaruri 2 2,662 Jan-17-2020, 01:55 PM
Last Post: Qaruri
  How to handle tables splitted across multiple web pages ankitjindalbti 2 2,154 Jun-02-2019, 07:33 AM
Last Post: ankitjindalbti
  scraping multiple pages of a website. Blue Dog 14 22,641 Jun-21-2018, 09:03 PM
Last Post: Blue Dog

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020