Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 MaxRetryError while scraping a website multiple times
#1
Hi,

I have been trying to retrieve some data from a website. My initial test for retrieving one time data worked as expected, but when I try to get the data from 2 or more links of the same website I receive the below error.
I am new to webscraping, and I am doing it using BeautifulSoup, Requests, Selenium and Pandas. I also added a timer for sleeping between queries. Any idea of the root cause and a possible workaround for it?


Error:
MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=57192): Max retries exceeded with url: /session/dcba731d4173518f03b593a17afe111c/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000001DD4212470>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
Thanks
Quote
#2
please show code
Quote
#3
The code that I am using for this is below:

from bs4 import BeautifulSoup
import requests, io
import pandas as pd
from selenium import webdriver
import time

################## NOTE THAT THIS CODE WORKS FOR 1 LINK AT A TIME, FOR MORE THAN ONE IT FAILS
############# error: MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=57192): Max retries exceeded with url: /session/dcba731d4173518f03b593a17afe111c/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000001DD4212470>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

driver = webdriver.Chrome(executable_path=r"myfolder\chromedriver.exe")
uchar=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','<','>',',']
timestamp = pd.datetime.today().strftime('%Y%m%d-&H&M&S')

links_df = pd.read_excel(r'myfolder\myfile.xlsx', sheetname='Hoja1')
links_Df = links_df[(links_df['Country'] == 'PT')]

results = pd.DataFrame(columns=['ISIN', 'N Shares', 'Link'])

for ISIN in links_df.ISIN:
    link='https://www.bolsadelisboa.com.pt/pt-pt/products/equities/' + ISIN + '-XLIS/market-information'
    driver.get(link)
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    driver.quit()
    r=soup.find_all("strong")[14]
    dirtyresult=str(r)
    for x in uchar:
        cleanresult=dirtyresult.replace(x,"").replace("<strong>","").replace("</strong>","")
    time.sleep(30)
    
results = results.append({'ISIN': ISIN, 'N Shares': cleanresult, 'Link': link}, ignore_index=True)
print(ISIN +": " + cleanresult)
    
results.to_csv(r'myfolder\output' + timestamp + '.csv', index=False)

print('Finish')
Quote
#4
The error indicates that the server will not allow multiple access
you probably have to close the first connection before the second is attempted.
Quote
#5
Hi Larz,

Isn't the connection closed with the below code?

driver.quit()
Quote
#6
Yes, I believe so.
So you need to restart browser for next iteration
Quote
#7
OK, it looks like by putting the below code for each iterations it works

driver = webdriver.Chrome(executable_path=r"myfolder\chromedriver.exe")
Thanks
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  scraping from a website that hides source code PIWI_Protein 1 96 Mar-27-2020, 05:08 PM
Last Post: Larz60+
  Scraping not moving to the next pages in a website jithin123 0 70 Mar-23-2020, 06:10 PM
Last Post: jithin123
  Scraping from multiple URLS to print in a single line. jb89 4 195 Jan-29-2020, 06:12 AM
Last Post: perfringo
  Random Loss of Control of Website When Scraping bmccollum 0 221 Aug-30-2019, 04:04 AM
Last Post: bmccollum
  scraping with multiple iframe jansky 1 1,319 Nov-09-2018, 11:12 AM
Last Post: snippsat
  scraping multiple pages of a website. Blue Dog 14 13,772 Jun-21-2018, 09:03 PM
Last Post: Blue Dog
  Scraping data from a web page where same class name applied multiple times sumandas89 1 5,979 Dec-30-2017, 11:03 AM
Last Post: buran
  Scraping number in % from website santax 3 2,120 Mar-19-2017, 12:22 PM
Last Post: santax

Forum Jump:


Users browsing this thread: 1 Guest(s)