Apr-26-2020, 11:58 PM
Hello all,
I was going to try find a intro section on the website but I couldn't find one.
Essentially I have for a while been creating some intricate excels that have taken the data I wanted and done what I wanted and all linked. The issues is that the data I source is all done manually so I wanted to automate this even further and thought I should try learn python. I have 0 idea going into this how it all works, but I searched and guess that web scraping is what I am after. My thoughts is that if I learn web scraping directly I wouldn't waste months on trying to learn python from the beginning. This could be the wrong mindset but like I said I have no other prior knowledge so this seems reasonable at the moment.
What my end goal is, to scrape nba player data and cross reference that with nba player card prices (either from ebay or a site called pwcc).
What I have done so far is watched a tutorial on web scraping and read another one - both had different formatted scripts.
I did both of them to a mild level of success. My first real issue is when following the one I read.
I'll try my best to insert it properly but essentially in this little practice one I am trying to scrape prices of nba card boxes over multiple pages. I seem to be unable to get all the data, some of the items are not getting picked up. I have gone to those pages individually and I can find them manually but when I run it no luck. Hopefully someone can help me, hopefully this is allowed, if it is I look forward to posting on here a lot more to try learn and get better with this process.
Thanks
Dazza
I was going to try find a intro section on the website but I couldn't find one.
Essentially I have for a while been creating some intricate excels that have taken the data I wanted and done what I wanted and all linked. The issues is that the data I source is all done manually so I wanted to automate this even further and thought I should try learn python. I have 0 idea going into this how it all works, but I searched and guess that web scraping is what I am after. My thoughts is that if I learn web scraping directly I wouldn't waste months on trying to learn python from the beginning. This could be the wrong mindset but like I said I have no other prior knowledge so this seems reasonable at the moment.
What my end goal is, to scrape nba player data and cross reference that with nba player card prices (either from ebay or a site called pwcc).
What I have done so far is watched a tutorial on web scraping and read another one - both had different formatted scripts.
I did both of them to a mild level of success. My first real issue is when following the one I read.
I'll try my best to insert it properly but essentially in this little practice one I am trying to scrape prices of nba card boxes over multiple pages. I seem to be unable to get all the data, some of the items are not getting picked up. I have gone to those pages individually and I can find them manually but when I run it no luck. Hopefully someone can help me, hopefully this is allowed, if it is I look forward to posting on here a lot more to try learn and get better with this process.
Thanks
Dazza
import requests from requests import get from bs4 import BeautifulSoup import pandas as pd import numpy as np from time import sleep from random import randint #Initialize empty lists where you'll store your data Product_name = [] Price = [] pages = np.arange(1, 50, 24) for page in pages: page = requests.get("https://www.cherrycollectables.com.au/collections/nba?" + str(page) + "&view=view-24&grid_list=grid-view") soup = BeautifulSoup(page.text, 'html.parser') cherry = soup.find_all('div', class_='productitem--info') sleep(randint(2,10)) for container in cherry: #Name name = container.h2.a.text.strip() Product_name.append(name) #Price price = container.div.find('div', class_='price--main').text.strip() Price.append(price) #building our Pandas dataframe Cherry_Products = pd.DataFrame({ 'Product Name': Product_name, 'Price': Price, }) Cherry_Products.to_csv('Cherry Products.csv')