Python Forum

Hello All,

I am an absolute beginner, below is the script I half wrote, half grabbed bits from multiple online sources.
Essentially getting name, price and dates of sold items on ebay.
I think i've sold the issue of making numeric of thousand numbers (ie: 4,000) but I can't seem to get an answer online about fixing the leap year issue in datetime.strptime, the error being 'ValueError: day is out of range for month'

Another issue is I don't think I'm using numpy right, in the 'pages = np.arange()', sometimes it scrapes info and then repeats, sometimes it scrapes X minus 2 bits of prices/products where I can see on the webpage it is just X (IE i see 69 but I get only 67 or 66). Any help with this section is also needed/appreciated please.

All help appreciated thanks.

import requests
from requests import get
from bs4 import BeautifulSoup
from urllib.request import urlretrieve
from urllib.parse import quote
import pandas as pd
import numpy as np
from datetime import datetime
from datetime import date
from time import sleep
from random import randint

#Initialize empty lists where you'll store your data
Product_name = []
Price = []
Date_sold = []

Search_name = input("Search for: ")
qstr = quote(Search_name)
Exclude_terms = input("Exclude these terms (- infront of all): ")
qstrr = quote(Exclude_terms)

pages = np.arange(1, 1000, 50)

for page in pages:

	page = requests.get("https://www.ebay.com.au/sch/i.html?_from=R40&_nkw=" + qstr + qstrr + "&_sacat=0&LH_TitleDesc=0&_fsrp=1&LH_Complete=1&rt=nc&LH_Sold=1&_pgn=" + str(page))
	
	soup = BeautifulSoup(page.text, 'html.parser')

	search = soup.find_all('div', class_='s-item__wrapper')

	sleep(randint(2,10))

	for container in search:
  
		#Name
		name = container.h3.text.strip()
		Product_name.append(name)

		#Price
		price = container.find('span', class_='s-item__price').text.strip() if container.find('span', class_='POSITIVE') else ''
		Price.append(price)

		#Date Sold
		sold = container.find('span', class_='s-item__ended-date').text
		soldd = datetime.strptime(sold, '%b-%d %H:%M')
		solddd = datetime.strftime(soldd, '%d-%b') 
		Date_sold.append(solddd)

#building our Pandas dataframe         
EBay_Products = pd.DataFrame({
'Product Name': Product_name,
'Price': Price,
'Sold Day' : Date_sold
})

EBay_Products['Price'] = EBay_Products['Price'].map(lambda x: x.lstrip('AU $'))
EBay_Products['Price'] = pd.to_numeric(EBay_Products['Price'].str.replace(',',''), errors='coerce')

EBay_Products.to_csv(Search_name + " scraped on " + str(date.today()) + '.csv')

you will have to extract dates and test each
use calendar function:

>>> import calendar
>>> print(calendar.isleap(2020))
True
>>> print(calendar.isleap(2019))
False

(May-06-2020, 05:22 PM)Larz60+ Wrote: [ -> ]you will have to extract dates and test each
use calendar function:
>>> import calendar
>>> print(calendar.isleap(2020))
True
>>> print(calendar.isleap(2019))
False

Appreciate a response. I have no idea what to do with this.
I thought maybe you mean like once I get the date, I run it through a pass fail for leap year or Feb-29 and if it fails then apply something to fix it, but I cannot find anything or everything i've tried isn't working. sorry bu can I get more help please?

		
sold = container.find('span', class_='s-item__ended-date').text
soldd = datetime.strptime(sold, '%b-%d %H:%M')
solddd = datetime.strftime(soldd, '%d-%b') 
Date_sold.append(solddd)

And thought ok if I go 'print(calendar.isleap(sold)

Anyone can help please?

warriordazza

Larz60+

warriordazza

warriordazza