Python Forum

Full Version: Error with crawling site.
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi I am getting the following exceptions when I am making a request to the following site.
URL: http://www.sec.gov/cgi-bin/browse-edgar?...&count=100
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

I have no issues connecting to it directly - i.e via a web browser.
I have a proxy however I am passing that to the request object please advice.
did you supply UserAgent header that is browser?
Yes. HEre is the code.
http_proxy = "xxxxx"


proxyDict = {
"http" : http_proxy
}
headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
for i in range(20):
r = None
try:
r = requests.get(base_url, proxies=proxyDict, headers=headers)
data = r.text
except Exception as e:
print('Error getting requests')
print(e.__doc__)
print(e.message)

sleep(20)

I used fake_useragent as well..
same error message.
You are sending requests as fast as the system can. Put some random delay between these requests. Some servers don't respect bots.