Is it possible to add a delay right after a request.get()

cubangt · Aug-25-2023, 02:31 AM

I'm trying to build a scraper to get pricing and description from this site, just for the men's shoes.
When you visit the site normally via a browser, the page loads, but then some sort of "processing" activity occurs which makes the page inaccessible for 2 or 3 seconds, then you scroll and click on anything, but this activity occurs on every page as you navigate the page results. Doesn't seem to happen on the individual detail pages.

Anyway, it seems that in order to get the below code, im thinking that some sort of delay will need to be added at some point in order to allow the page to load and let that process run, then try to access and scrape the page. I ran the exact same code on another shoe site and it processed roughly 230 results in about 30 seconds.

import requests
from bs4 import BeautifulSoup

# https://www.dickssportinggoods.com/f/all-mens-footwear?pageNumber=0 this is Page 1
    
    response = requests.get('https://www.dickssportinggoods.com/f/all-mens-footwear?pageNumber=2')
    soup = BeautifulSoup(response.content,'lxml')
    
    productdetails = []

for x in range(0,87):
    response = requests.get(f'https://www.dickssportinggoods.com/f/all-mens-footwear?pageNumber={x}')
    soup = BeautifulSoup(response.content,'lxml')

    element_list = soup.find_all('div',class_='product-content')
    for element in element_list:
        for link in element.find_all('a', class_='product-card-simple-title'):
            print("Description: " + link.get_text().strip())
            productdetails.append("Description: " + link.get_text().strip())
            for price in element.find_all('span',class_='sr-only'):
                print("Price: " + price.get_text().strip().replace('\n', '').replace(' ','').replace('dollars','.').replace('cents',''))
                productdetails.append("Price: " + price.get_text().strip().replace('\n', '').replace(' ','').replace('dollars','.').replace('cents',''))

KevinGomez · Aug-28-2023, 01:05 PM

Any update?

**Larz60+** · Aug-29-2023, 03:35 AM

for precise control, I would recommend selenium. see this.

alan_a · (This post was last modified: Aug-30-2023, 03:27 AM by Larz60+.)

Building a scraper can be challenging, especially when dealing with dynamic websites that have loading delays or use AJAX to load content. Given the "processing" activity you described, it sounds like the website might be using some form of lazy loading or client-side rendering. When considering mobile app development in ... it's essential to be aware of such intricacies, as they can affect the user experience and the efficiency of data retrieval.

To handle this in your scraper, you might need to use a tool like Selenium or Puppeteer, which allows for browser automation. These tools can mimic real user interactions, like waiting for a page to load fully before scraping the content.

Here's a basic approach using Selenium:

`python
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Initialize the browser
driver = webdriver.Chrome()

# Navigate to the website
driver.get('URL_OF_THE_WEBSITE')

# Wait for the "processing" activity to complete
wait = WebDriverWait(driver, 10) # wait for up to 10 seconds
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'CSS_SELECTOR_OF_AN_ELEMENT_YOU_WANT_TO_WAIT_FOR')))

# Now, you can scrape the content
content = driver.page_source

# Don't forget to close the browser once done
driver.quit()
`

Remember to replace 'URL_OF_THE_WEBSITE' with the actual URL and 'CSS_SELECTOR_OF_AN_ELEMENT_YOU_WANT_TO_WAIT_FOR' with a CSS selector of an element you know will be present after the "processing" activity.

Also, when building or using scrapers, always ensure you're respecting the website's robots.txt file and terms of service. Some sites might have restrictions against scraping, and you wouldn't want to inadvertently violate any terms.

Larz60+ write Aug-30-2023, 03:27 AM:
spam content removed
Also:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.

cubangt · Aug-29-2023, 07:45 PM

So soon after posting my initial question above, i went and took a few days off, so no new updates yet. BUT i will be working on this again this week.. thank you for the suggestions and will def work with the above sample to see if i can understand and work with Selenium.

You mentioned the robots.txt file.. is that something that has to be written into the python code logic? or is that more of a read that file to make sure of any restrictions?

**Larz60+** · Aug-30-2023, 03:29 AM

robots.text will be found in the root page of a website.
In your instance it can be found here.

shoesinquiry · (This post was last modified: Sep-07-2023, 10:13 AM by buran.)

Yes, it is possible to add a delay right after making a request using the requests.get() function in Python. Adding a delay can be useful in situations where you want to control the rate at which you make requests to a web server, especially if you're scraping data or interacting with a web API that has rate-limiting policies.

You can introduce a delay using the time.sleep() function from the time module. Here's an example of how to add a delay of, let's say, 2 seconds after making a GET request:

import requests
import time

url = "https://example.com"
response = requests.get(url)

# Add a 2-second delay
time.sleep(2)

# Continue with your code after the delay

In this example, the time.sleep(2) line will pause the execution of your script for 2 seconds before moving on to the next line of code. You can adjust the delay duration by changing the argument to time.sleep() to suit your needs.

Keep in mind that while adding a delay can help you avoid overloading a server or violating rate limits, it will also make your script run slower. Balancing the delay duration is important to achieve the desired rate of request while ensuring your script remains efficient.

buran write Sep-07-2023, 10:13 AM:
please, don't use triple backquotes to mark code, use python tags instead

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Request Delay	pheadrus	1	3,819	Nov-25-2021, 08:51 PM Last Post: snippsat
	how can I correct the Bad Request error on my curl request	tomtom	8	5,088	Oct-03-2021, 06:32 AM Last Post: tomtom
	adding a delay on end	Daz2264	6	2,497	Sep-29-2021, 02:57 PM Last Post: deanhystad
	python delay without interrupt the whole code	Nick_tkinter	4	5,170	Feb-22-2021, 10:51 PM Last Post: nilamo
	How to read CSV file one row at the time in a range and some delay in between	greenpine	2	4,758	Nov-20-2020, 02:26 PM Last Post: greenpine
	configure delay on only one link using python3	HiImAl	3	2,727	Oct-21-2020, 07:51 PM Last Post: buran
	ImportError: cannot import name 'Request' from 'request'	abhishek81py	1	3,945	Jun-18-2020, 08:07 AM Last Post: buran
	Keyboard commands and delay/latency	RungJa	0	2,146	Mar-29-2020, 01:28 PM Last Post: RungJa
	Unwanted delay between looped synth plays	WolfeCreek	1	2,327	Aug-02-2018, 09:24 PM Last Post: Vysero
	Vpython Delay in plotting points	SohaibAJ	0	2,072	Jul-30-2018, 08:44 PM Last Post: SohaibAJ

Is it possible to add a delay right after a request.get()

User Panel Messages

Announcements