Aug-25-2023, 02:31 AM
I'm trying to build a scraper to get pricing and description from this site, just for the men's shoes.
When you visit the site normally via a browser, the page loads, but then some sort of "processing" activity occurs which makes the page inaccessible for 2 or 3 seconds, then you scroll and click on anything, but this activity occurs on every page as you navigate the page results. Doesn't seem to happen on the individual detail pages.
Anyway, it seems that in order to get the below code, im thinking that some sort of delay will need to be added at some point in order to allow the page to load and let that process run, then try to access and scrape the page. I ran the exact same code on another shoe site and it processed roughly 230 results in about 30 seconds.
When you visit the site normally via a browser, the page loads, but then some sort of "processing" activity occurs which makes the page inaccessible for 2 or 3 seconds, then you scroll and click on anything, but this activity occurs on every page as you navigate the page results. Doesn't seem to happen on the individual detail pages.
Anyway, it seems that in order to get the below code, im thinking that some sort of delay will need to be added at some point in order to allow the page to load and let that process run, then try to access and scrape the page. I ran the exact same code on another shoe site and it processed roughly 230 results in about 30 seconds.
import requests from bs4 import BeautifulSoup # https://www.dickssportinggoods.com/f/all-mens-footwear?pageNumber=0 this is Page 1 response = requests.get('https://www.dickssportinggoods.com/f/all-mens-footwear?pageNumber=2') soup = BeautifulSoup(response.content,'lxml') productdetails = [] for x in range(0,87): response = requests.get(f'https://www.dickssportinggoods.com/f/all-mens-footwear?pageNumber={x}') soup = BeautifulSoup(response.content,'lxml') element_list = soup.find_all('div',class_='product-content') for element in element_list: for link in element.find_all('a', class_='product-card-simple-title'): print("Description: " + link.get_text().strip()) productdetails.append("Description: " + link.get_text().strip()) for price in element.find_all('span',class_='sr-only'): print("Price: " + price.get_text().strip().replace('\n', '').replace(' ','').replace('dollars','.').replace('cents','')) productdetails.append("Price: " + price.get_text().strip().replace('\n', '').replace(' ','').replace('dollars','.').replace('cents',''))