Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Literal beginner - needs help
#1
Hello all,

I was going to try find a intro section on the website but I couldn't find one.
Essentially I have for a while been creating some intricate excels that have taken the data I wanted and done what I wanted and all linked. The issues is that the data I source is all done manually so I wanted to automate this even further and thought I should try learn python. I have 0 idea going into this how it all works, but I searched and guess that web scraping is what I am after. My thoughts is that if I learn web scraping directly I wouldn't waste months on trying to learn python from the beginning. This could be the wrong mindset but like I said I have no other prior knowledge so this seems reasonable at the moment.

What my end goal is, to scrape nba player data and cross reference that with nba player card prices (either from ebay or a site called pwcc).

What I have done so far is watched a tutorial on web scraping and read another one - both had different formatted scripts.
I did both of them to a mild level of success. My first real issue is when following the one I read.

I'll try my best to insert it properly but essentially in this little practice one I am trying to scrape prices of nba card boxes over multiple pages. I seem to be unable to get all the data, some of the items are not getting picked up. I have gone to those pages individually and I can find them manually but when I run it no luck. Hopefully someone can help me, hopefully this is allowed, if it is I look forward to posting on here a lot more to try learn and get better with this process.
Thanks
Dazza

import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

from time import sleep
from random import randint

#Initialize empty lists where you'll store your data
Product_name = []
Price = []

pages = np.arange(1, 50, 24)

for page in pages:

	page = requests.get("https://www.cherrycollectables.com.au/collections/nba?" + str(page) + "&view=view-24&grid_list=grid-view")

	soup = BeautifulSoup(page.text, 'html.parser')

	cherry = soup.find_all('div', class_='productitem--info')

	sleep(randint(2,10))

	for container in cherry:
  
	#Name
		name = container.h2.a.text.strip()
		Product_name.append(name)
   	     
	#Price
		price = container.div.find('div', class_='price--main').text.strip()
		Price.append(price)


#building our Pandas dataframe         
Cherry_Products = pd.DataFrame({
'Product Name': Product_name,
'Price': Price,
})

Cherry_Products.to_csv('Cherry Products.csv')
Reply
#2
This will be a challenge with all the pop-ups that this site has, but not impossible.
you can run through the two web scraping tutorials which will give you the basics that you need to scrape data.
it won't take long, probably just one session.
web scraping part 1
web scraping part 2
Reply
#3
(Apr-27-2020, 05:11 AM)Larz60+ Wrote: This will be a challenge with all the pop-ups that this site has, but not impossible.
you can run through the two web scraping tutorials which will give you the basics that you need to scrape data.
it won't take long, probably just one session.
web scraping part 1
web scraping part 2

Thanks appreciate the links. I'm going to do them tonight. If i still need help with that specific script I copied I'll speak up

When i say literal beginner I mean I know nothing, so i apologize but i went through the first tutorial and scrolled through the second.
It doesn't really explain that much what I am doing (I get the logic for some of it) but really doesn't explain much.

Also don't see how this helps my question as to why i wasn't getting all the values for my attempted script, am I missing something here?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  "EOL While Scanning String Literal" tjnichols 39 21,876 May-06-2018, 02:20 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020