Python Forum
Beginner help - Leap Year Issue Feb 29 and multiple pages
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Beginner help - Leap Year Issue Feb 29 and multiple pages
#1
Hello All,

I am an absolute beginner, below is the script I half wrote, half grabbed bits from multiple online sources.
Essentially getting name, price and dates of sold items on ebay.
I think i've sold the issue of making numeric of thousand numbers (ie: 4,000) but I can't seem to get an answer online about fixing the leap year issue in datetime.strptime, the error being 'ValueError: day is out of range for month'

Another issue is I don't think I'm using numpy right, in the 'pages = np.arange()', sometimes it scrapes info and then repeats, sometimes it scrapes X minus 2 bits of prices/products where I can see on the webpage it is just X (IE i see 69 but I get only 67 or 66). Any help with this section is also needed/appreciated please.

All help appreciated thanks.
import requests
from requests import get
from bs4 import BeautifulSoup
from urllib.request import urlretrieve
from urllib.parse import quote
import pandas as pd
import numpy as np
from datetime import datetime
from datetime import date
from time import sleep
from random import randint

#Initialize empty lists where you'll store your data
Product_name = []
Price = []
Date_sold = []

Search_name = input("Search for: ")
qstr = quote(Search_name)
Exclude_terms = input("Exclude these terms (- infront of all): ")
qstrr = quote(Exclude_terms)

pages = np.arange(1, 1000, 50)

for page in pages:

	page = requests.get("https://www.ebay.com.au/sch/i.html?_from=R40&_nkw=" + qstr + qstrr + "&_sacat=0&LH_TitleDesc=0&_fsrp=1&LH_Complete=1&rt=nc&LH_Sold=1&_pgn=" + str(page))
	
	soup = BeautifulSoup(page.text, 'html.parser')

	search = soup.find_all('div', class_='s-item__wrapper')

	sleep(randint(2,10))

	for container in search:
  
		#Name
		name = container.h3.text.strip()
		Product_name.append(name)

		#Price
		price = container.find('span', class_='s-item__price').text.strip() if container.find('span', class_='POSITIVE') else ''
		Price.append(price)

		#Date Sold
		sold = container.find('span', class_='s-item__ended-date').text
		soldd = datetime.strptime(sold, '%b-%d %H:%M')
		solddd = datetime.strftime(soldd, '%d-%b') 
		Date_sold.append(solddd)

#building our Pandas dataframe         
EBay_Products = pd.DataFrame({
'Product Name': Product_name,
'Price': Price,
'Sold Day' : Date_sold
})

EBay_Products['Price'] = EBay_Products['Price'].map(lambda x: x.lstrip('AU $'))
EBay_Products['Price'] = pd.to_numeric(EBay_Products['Price'].str.replace(',',''), errors='coerce')

EBay_Products.to_csv(Search_name + " scraped on " + str(date.today()) + '.csv')
Reply
#2
you will have to extract dates and test each
use calendar function:
>>> import calendar
>>> print(calendar.isleap(2020))
True
>>> print(calendar.isleap(2019))
False
Reply
#3
(May-06-2020, 05:22 PM)Larz60+ Wrote: you will have to extract dates and test each
use calendar function:
>>> import calendar
>>> print(calendar.isleap(2020))
True
>>> print(calendar.isleap(2019))
False

Appreciate a response. I have no idea what to do with this.
I thought maybe you mean like once I get the date, I run it through a pass fail for leap year or Feb-29 and if it fails then apply something to fix it, but I cannot find anything or everything i've tried isn't working. sorry bu can I get more help please?
		
sold = container.find('span', class_='s-item__ended-date').text
soldd = datetime.strptime(sold, '%b-%d %H:%M')
solddd = datetime.strftime(soldd, '%d-%b') 
Date_sold.append(solddd)
And thought ok if I go 'print(calendar.isleap(sold)
Reply
#4
Anyone can help please?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Scrape table from multiple pages Nhattanktnn 1 860 Jun-07-2023, 09:35 AM
Last Post: Larz60+
  Web scrap multiple pages anilacem_302 3 3,828 Jul-01-2020, 07:50 PM
Last Post: mlieqo
  scraping multiple pages from table bandar 1 2,687 Jun-27-2020, 10:43 PM
Last Post: Larz60+
  Scraping Multiple Pages mbadatanut 1 4,220 May-08-2020, 02:30 AM
Last Post: Larz60+
  Looping through multiple pages with changing url Qaruri 2 2,575 Jan-17-2020, 01:55 PM
Last Post: Qaruri
  How to handle tables splitted across multiple web pages ankitjindalbti 2 2,096 Jun-02-2019, 07:33 AM
Last Post: ankitjindalbti
  scraping multiple pages of a website. Blue Dog 14 22,402 Jun-21-2018, 09:03 PM
Last Post: Blue Dog

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020