Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Error with crawling site.
#1
Hi I am getting the following exceptions when I am making a request to the following site.
URL: http://www.sec.gov/cgi-bin/browse-edgar?...&count=100
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

I have no issues connecting to it directly - i.e via a web browser.
I have a proxy however I am passing that to the request object please advice.
Reply
#2
did you supply UserAgent header that is browser?
Reply
#3
Yes. HEre is the code.
http_proxy = "xxxxx"


proxyDict = {
"http" : http_proxy
}
headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
for i in range(20):
r = None
try:
r = requests.get(base_url, proxies=proxyDict, headers=headers)
data = r.text
except Exception as e:
print('Error getting requests')
print(e.__doc__)
print(e.message)

sleep(20)

I used fake_useragent as well..
same error message.
Reply
#4
You are sending requests as fast as the system can. Put some random delay between these requests. Some servers don't respect bots.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  need help with crawling FB likes najla3 2 2,665 Jan-09-2018, 11:15 AM
Last Post: najla3

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020