MaxRetryError while scraping a website multiple times - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: MaxRetryError while scraping a website multiple times (/thread-20715.html) |
MaxRetryError while scraping a website multiple times - kawasso - Aug-26-2019 Hi, I have been trying to retrieve some data from a website. My initial test for retrieving one time data worked as expected, but when I try to get the data from 2 or more links of the same website I receive the below error. I am new to webscraping, and I am doing it using BeautifulSoup, Requests, Selenium and Pandas. I also added a timer for sleeping between queries. Any idea of the root cause and a possible workaround for it? Thanks
RE: MaxRetryError while scraping a website multiple times - Larz60+ - Aug-27-2019 please show code RE: MaxRetryError while scraping a website multiple times - kawasso - Aug-27-2019 The code that I am using for this is below: from bs4 import BeautifulSoup import requests, io import pandas as pd from selenium import webdriver import time ################## NOTE THAT THIS CODE WORKS FOR 1 LINK AT A TIME, FOR MORE THAN ONE IT FAILS ############# error: MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=57192): Max retries exceeded with url: /session/dcba731d4173518f03b593a17afe111c/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000001DD4212470>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it')) driver = webdriver.Chrome(executable_path=r"myfolder\chromedriver.exe") uchar=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','<','>',','] timestamp = pd.datetime.today().strftime('%Y%m%d-&H&M&S') links_df = pd.read_excel(r'myfolder\myfile.xlsx', sheetname='Hoja1') links_Df = links_df[(links_df['Country'] == 'PT')] results = pd.DataFrame(columns=['ISIN', 'N Shares', 'Link']) for ISIN in links_df.ISIN: link='https://www.bolsadelisboa.com.pt/pt-pt/products/equities/' + ISIN + '-XLIS/market-information' driver.get(link) soup = BeautifulSoup(driver.page_source, 'html.parser') driver.quit() r=soup.find_all("strong")[14] dirtyresult=str(r) for x in uchar: cleanresult=dirtyresult.replace(x,"").replace("<strong>","").replace("</strong>","") time.sleep(30) results = results.append({'ISIN': ISIN, 'N Shares': cleanresult, 'Link': link}, ignore_index=True) print(ISIN +": " + cleanresult) results.to_csv(r'myfolder\output' + timestamp + '.csv', index=False) print('Finish') RE: MaxRetryError while scraping a website multiple times - Larz60+ - Aug-28-2019 The error indicates that the server will not allow multiple access you probably have to close the first connection before the second is attempted. RE: MaxRetryError while scraping a website multiple times - kawasso - Aug-28-2019 Hi Larz, Isn't the connection closed with the below code? driver.quit() RE: MaxRetryError while scraping a website multiple times - Larz60+ - Aug-29-2019 Yes, I believe so. So you need to restart browser for next iteration RE: MaxRetryError while scraping a website multiple times - kawasso - Aug-29-2019 OK, it looks like by putting the below code for each iterations it works driver = webdriver.Chrome(executable_path=r"myfolder\chromedriver.exe")Thanks |