Error with crawling site.

jj2110 · Oct-10-2017, 04:11 PM

Hi I am getting the following exceptions when I am making a request to the following site.
URL: http://www.sec.gov/cgi-bin/browse-edgar?...&count=100
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

I have no issues connecting to it directly - i.e via a web browser.
I have a proxy however I am passing that to the request object please advice.

**buran** · Oct-10-2017, 04:16 PM

did you supply UserAgent header that is browser?

jj2110 · (This post was last modified: Oct-10-2017, 05:10 PM by jj2110.)

Yes. HEre is the code.
http_proxy = "xxxxx"

proxyDict = {
"http" : http_proxy
}
headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
for i in range(20):
r = None
try:
r = requests.get(base_url, proxies=proxyDict, headers=headers)
data = r.text
except Exception as e:
print('Error getting requests')
print(e.__doc__)
print(e.message)

sleep(20)

I used fake_useragent as well..
same error message.

wavic · Oct-10-2017, 06:36 PM

You are sending requests as fast as the system can. Put some random delay between these requests. Some servers don't respect bots.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	need help with crawling FB likes	najla3	2	2,665	Jan-09-2018, 11:15 AM Last Post: najla3

Error with crawling site.

User Panel Messages

Announcements