Python Forum
Is it possible to add a delay right after a request.get()
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Is it possible to add a delay right after a request.get()
#1
I'm trying to build a scraper to get pricing and description from this site, just for the men's shoes.
When you visit the site normally via a browser, the page loads, but then some sort of "processing" activity occurs which makes the page inaccessible for 2 or 3 seconds, then you scroll and click on anything, but this activity occurs on every page as you navigate the page results. Doesn't seem to happen on the individual detail pages.

Anyway, it seems that in order to get the below code, im thinking that some sort of delay will need to be added at some point in order to allow the page to load and let that process run, then try to access and scrape the page. I ran the exact same code on another shoe site and it processed roughly 230 results in about 30 seconds.

import requests
from bs4 import BeautifulSoup

# https://www.dickssportinggoods.com/f/all-mens-footwear?pageNumber=0 this is Page 1
    
    response = requests.get('https://www.dickssportinggoods.com/f/all-mens-footwear?pageNumber=2')
    soup = BeautifulSoup(response.content,'lxml')
    
    productdetails = []

for x in range(0,87):
    response = requests.get(f'https://www.dickssportinggoods.com/f/all-mens-footwear?pageNumber={x}')
    soup = BeautifulSoup(response.content,'lxml')

    element_list = soup.find_all('div',class_='product-content')
    for element in element_list:
        for link in element.find_all('a', class_='product-card-simple-title'):
            print("Description: " + link.get_text().strip())
            productdetails.append("Description: " + link.get_text().strip())
            for price in element.find_all('span',class_='sr-only'):
                print("Price: " + price.get_text().strip().replace('\n', '').replace(' ','').replace('dollars','.').replace('cents',''))
                productdetails.append("Price: " + price.get_text().strip().replace('\n', '').replace(' ','').replace('dollars','.').replace('cents',''))
Reply
#2
Any update?
Reply
#3
for precise control, I would recommend selenium. see this.
Reply
#4
Building a scraper can be challenging, especially when dealing with dynamic websites that have loading delays or use AJAX to load content. Given the "processing" activity you described, it sounds like the website might be using some form of lazy loading or client-side rendering. When considering mobile app development in ... it's essential to be aware of such intricacies, as they can affect the user experience and the efficiency of data retrieval.

To handle this in your scraper, you might need to use a tool like Selenium or Puppeteer, which allows for browser automation. These tools can mimic real user interactions, like waiting for a page to load fully before scraping the content.

Here's a basic approach using Selenium:

`python
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Initialize the browser
driver = webdriver.Chrome()

# Navigate to the website
driver.get('URL_OF_THE_WEBSITE')

# Wait for the "processing" activity to complete
wait = WebDriverWait(driver, 10) # wait for up to 10 seconds
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'CSS_SELECTOR_OF_AN_ELEMENT_YOU_WANT_TO_WAIT_FOR')))

# Now, you can scrape the content
content = driver.page_source

# Don't forget to close the browser once done
driver.quit()
`

Remember to replace 'URL_OF_THE_WEBSITE' with the actual URL and 'CSS_SELECTOR_OF_AN_ELEMENT_YOU_WANT_TO_WAIT_FOR' with a CSS selector of an element you know will be present after the "processing" activity.

Also, when building or using scrapers, always ensure you're respecting the website's robots.txt file and terms of service. Some sites might have restrictions against scraping, and you wouldn't want to inadvertently violate any terms.
Larz60+ write Aug-30-2023, 03:27 AM:
spam content removed
Also:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply
#5
So soon after posting my initial question above, i went and took a few days off, so no new updates yet. BUT i will be working on this again this week.. thank you for the suggestions and will def work with the above sample to see if i can understand and work with Selenium.

You mentioned the robots.txt file.. is that something that has to be written into the python code logic? or is that more of a read that file to make sure of any restrictions?
Reply
#6
robots.text will be found in the root page of a website.
In your instance it can be found here.
Reply
#7
Yes, it is possible to add a delay right after making a request using the requests.get() function in Python. Adding a delay can be useful in situations where you want to control the rate at which you make requests to a web server, especially if you're scraping data or interacting with a web API that has rate-limiting policies.

You can introduce a delay using the time.sleep() function from the time module. Here's an example of how to add a delay of, let's say, 2 seconds after making a GET request:


import requests
import time

url = "https://example.com"
response = requests.get(url)

# Add a 2-second delay
time.sleep(2)

# Continue with your code after the delay
In this example, the time.sleep(2) line will pause the execution of your script for 2 seconds before moving on to the next line of code. You can adjust the delay duration by changing the argument to time.sleep() to suit your needs.

Keep in mind that while adding a delay can help you avoid overloading a server or violating rate limits, it will also make your script run slower. Balancing the delay duration is important to achieve the desired rate of request while ensuring your script remains efficient.
buran write Sep-07-2023, 10:13 AM:
please, don't use triple backquotes to mark code, use python tags instead
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Request Delay pheadrus 1 3,819 Nov-25-2021, 08:51 PM
Last Post: snippsat
  how can I correct the Bad Request error on my curl request tomtom 8 5,088 Oct-03-2021, 06:32 AM
Last Post: tomtom
  adding a delay on end Daz2264 6 2,497 Sep-29-2021, 02:57 PM
Last Post: deanhystad
  python delay without interrupt the whole code Nick_tkinter 4 5,170 Feb-22-2021, 10:51 PM
Last Post: nilamo
  How to read CSV file one row at the time in a range and some delay in between greenpine 2 4,758 Nov-20-2020, 02:26 PM
Last Post: greenpine
  configure delay on only one link using python3 HiImAl 3 2,727 Oct-21-2020, 07:51 PM
Last Post: buran
  ImportError: cannot import name 'Request' from 'request' abhishek81py 1 3,945 Jun-18-2020, 08:07 AM
Last Post: buran
  Keyboard commands and delay/latency RungJa 0 2,146 Mar-29-2020, 01:28 PM
Last Post: RungJa
  Unwanted delay between looped synth plays WolfeCreek 1 2,327 Aug-02-2018, 09:24 PM
Last Post: Vysero
  Vpython Delay in plotting points SohaibAJ 0 2,072 Jul-30-2018, 08:44 PM
Last Post: SohaibAJ

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020