Is it possible to add a delay right after a request.get() - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Is it possible to add a delay right after a request.get() (/thread-40600.html) |
Is it possible to add a delay right after a request.get() - cubangt - Aug-25-2023 I'm trying to build a scraper to get pricing and description from this site, just for the men's shoes. When you visit the site normally via a browser, the page loads, but then some sort of "processing" activity occurs which makes the page inaccessible for 2 or 3 seconds, then you scroll and click on anything, but this activity occurs on every page as you navigate the page results. Doesn't seem to happen on the individual detail pages. Anyway, it seems that in order to get the below code, im thinking that some sort of delay will need to be added at some point in order to allow the page to load and let that process run, then try to access and scrape the page. I ran the exact same code on another shoe site and it processed roughly 230 results in about 30 seconds. import requests from bs4 import BeautifulSoup # https://www.dickssportinggoods.com/f/all-mens-footwear?pageNumber=0 this is Page 1 response = requests.get('https://www.dickssportinggoods.com/f/all-mens-footwear?pageNumber=2') soup = BeautifulSoup(response.content,'lxml') productdetails = [] for x in range(0,87): response = requests.get(f'https://www.dickssportinggoods.com/f/all-mens-footwear?pageNumber={x}') soup = BeautifulSoup(response.content,'lxml') element_list = soup.find_all('div',class_='product-content') for element in element_list: for link in element.find_all('a', class_='product-card-simple-title'): print("Description: " + link.get_text().strip()) productdetails.append("Description: " + link.get_text().strip()) for price in element.find_all('span',class_='sr-only'): print("Price: " + price.get_text().strip().replace('\n', '').replace(' ','').replace('dollars','.').replace('cents','')) productdetails.append("Price: " + price.get_text().strip().replace('\n', '').replace(' ','').replace('dollars','.').replace('cents','')) RE: Is it possible to add a delay right after a request.get() - KevinGomez - Aug-28-2023 Any update? RE: Is it possible to add a delay right after a request.get() - Larz60+ - Aug-29-2023 for precise control, I would recommend selenium. see this. RE: Is it possible to add a delay right after a request.get() - alan_a - Aug-29-2023 Building a scraper can be challenging, especially when dealing with dynamic websites that have loading delays or use AJAX to load content. Given the "processing" activity you described, it sounds like the website might be using some form of lazy loading or client-side rendering. When considering mobile app development in ... it's essential to be aware of such intricacies, as they can affect the user experience and the efficiency of data retrieval. To handle this in your scraper, you might need to use a tool like Selenium or Puppeteer, which allows for browser automation. These tools can mimic real user interactions, like waiting for a page to load fully before scraping the content. Here's a basic approach using Selenium: `pythonfrom selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC # Initialize the browser driver = webdriver.Chrome() # Navigate to the website driver.get('URL_OF_THE_WEBSITE') # Wait for the "processing" activity to complete wait = WebDriverWait(driver, 10) # wait for up to 10 seconds wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'CSS_SELECTOR_OF_AN_ELEMENT_YOU_WANT_TO_WAIT_FOR'))) # Now, you can scrape the content content = driver.page_source # Don't forget to close the browser once done driver.quit() `Remember to replace 'URL_OF_THE_WEBSITE' with the actual URL and 'CSS_SELECTOR_OF_AN_ELEMENT_YOU_WANT_TO_WAIT_FOR' with a CSS selector of an element you know will be present after the "processing" activity.Also, when building or using scrapers, always ensure you're respecting the website's robots.txt file and terms of service. Some sites might have restrictions against scraping, and you wouldn't want to inadvertently violate any terms.
RE: Is it possible to add a delay right after a request.get() - cubangt - Aug-29-2023 So soon after posting my initial question above, i went and took a few days off, so no new updates yet. BUT i will be working on this again this week.. thank you for the suggestions and will def work with the above sample to see if i can understand and work with Selenium. You mentioned the robots.txt file.. is that something that has to be written into the python code logic? or is that more of a read that file to make sure of any restrictions? RE: Is it possible to add a delay right after a request.get() - Larz60+ - Aug-30-2023 robots.text will be found in the root page of a website. In your instance it can be found here. RE: Is it possible to add a delay right after a request.get() - shoesinquiry - Sep-07-2023 Yes, it is possible to add a delay right after making a request using the requests.get() function in Python. Adding a delay can be useful in situations where you want to control the rate at which you make requests to a web server, especially if you're scraping data or interacting with a web API that has rate-limiting policies.You can introduce a delay using the time.sleep() function from the time module. Here's an example of how to add a delay of, let's say, 2 seconds after making a GET request:import requests import time url = "https://example.com" response = requests.get(url) # Add a 2-second delay time.sleep(2) # Continue with your code after the delayIn this example, the time.sleep(2) line will pause the execution of your script for 2 seconds before moving on to the next line of code. You can adjust the delay duration by changing the argument to time.sleep() to suit your needs.Keep in mind that while adding a delay can help you avoid overloading a server or violating rate limits, it will also make your script run slower. Balancing the delay duration is important to achieve the desired rate of request while ensuring your script remains efficient. |