I am writing a web scraping script where I have a list of urls to scrap and I iterate through these urls using a for loop. These urls will return the same page structure but with different data hence I am using the same code to scrap all these urls. My problem is that when I run the script, it returns a 404 html error for all the urls in the list and I think its because the for loop is running faster than the 'session.get()' can return the pages. I may be wrong, please advice what you think the problem is. Please see code below.
session = requests.session() url_list = list(current_urls_set) if len(url_list) > 0: Â Â payload = { Â Â Â Â 'UserName': 'myemail', Â Â Â Â 'Password': 'mypassword' Â Â } Â Â session.post('website url here', data=payload) Â Â rfq_dir = 'C:/Projects/TenderBot_Python/Tenders_and_RFQs/cpt/RFQs/{}'.format(datetime.today().strftime('%d-%m-%Y')) Â Â if not os.path.exists(rfq_dir): Â Â Â Â os.mkdir(rfq_dir) Â Â for url in url_list: Â Â Â Â data_list = url.split(',') Â Â Â Â closing_date = datetime.strptime(data_list[1], '%m/%d/%Y %I:%M:%S %p') Â Â Â Â if closing_date > datetime.now(): Â Â Â Â Â Â rfqHtml = session.get(websit url here{}'.format(data_list[3].strip())) Â Â Â Â Â Â print(rfqHtml.text)