Apr-24-2021, 08:02 AM
I'm using urllib.request to read pages for scraping. Things were connecting and reading OK but I wanted to add some exception handling in-case one of my rotating proxies was bad. So I made a bad proxy to test:
'1.1.1.1:3'
and expected things to fail... but it succeeded instead. Hmmm. Ok, then I used 'xxxx' as he proxy and it still pulled the page - so clearly I'm doing things wrong. I tried to use Fiddler so see if it was using my IP and port instead but I could not see that info in Fiddler. Here's my code:
Any help would be great. Thanks.
'1.1.1.1:3'
and expected things to fail... but it succeeded instead. Hmmm. Ok, then I used 'xxxx' as he proxy and it still pulled the page - so clearly I'm doing things wrong. I tried to use Fiddler so see if it was using my IP and port instead but I could not see that info in Fiddler. Here's my code:
proxy_handler = urllib.request.ProxyHandler({'http': 'xxxx'}) # 'xxxx' was '161.35.4.201:80' opener = urllib.request.build_opener(proxy_handler) opener.addheaders = [('User-agent', 'Mozilla/5.0')] urllib.request.install_opener(opener) page = urllib.request.urlopen(self.url) data = page.read()What's the point of urllib.request.ProxyHandler if it doesn't use the proxy I pass into it?
Any help would be great. Thanks.