Python Forum
Open URL via proxy - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Open URL via proxy (/thread-27303.html)



Open URL via proxy - perseus142 - Jun-02-2020

Hello team,
I am newbie and I am looking for help.
I would like to access a webpage and retrieve data - sound easy, right ?
However I have to use proxy. Here I am stuck :(
I have already tried some google help or stackowerflow advises, but I am still getting errors.

For example: https://stackoverflow.com/questions/34576665/setting-proxy-to-urllib-request-python3

My code #1:
import urllib.request as request

proxy_handler = request.ProxyHandler({'http': '<proxy omitted>'})
opener = request.build_opener(proxy_handler)
url = 'http://data.pr4e.org/romeo.txt'

# open the website with the opener
req = opener.open(url)
data = req.read().decode('utf8')
print(data)
Error:
Error:
PS C:\Users\<user>\Desktop> python .\week4.py Traceback (most recent call last): File ".\week4.py", line 33, in <module> req = opener.open(url) File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 531, in open response = meth(req, response) File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 640, in http_response response = self.parent.error( File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 569, in error return self._call_chain(*args) File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 502, in _call_chain result = func(*args) File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 404: Not Found
My code #2:
from urllib import request as urlrequest

proxy_host = '<proxy omitted>'    # host and port of your proxy
url = 'http://data.pr4e.org/romeo.txt'

req = urlrequest.Request(url)
req.set_proxy(proxy_host, 'http')

response = urlrequest.urlopen(req)
print(response.read().decode('utf8'))
Error:
Error:
PS C:\Users\<user>\Desktop> python .\week4.py Traceback (most recent call last): File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 865, in _get_hostport port = int(host[i+1:]) ValueError: invalid literal for int() with base 10: '8001/one-de-vpn.pac' During handling of the above exception, another exception occurred: Traceback (most recent call last): File ".\week4.py", line 29, in <module> response = urlrequest.urlopen(req) File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 525, in open response = self._open(req, data) File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 542, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 502, in _call_chain result = func(*args) File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 1348, in http_open return self.do_open(http.client.HTTPConnection, req) File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 1288, in do_open h = http_class(host, timeout=req.timeout, **http_conn_args) File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 829, in __init__ (self.host, self.port) = self._get_hostport(host, port) File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 870, in _get_hostport raise InvalidURL("nonnumeric port: '%s'" % host[i+1:]) http.client.InvalidURL: nonnumeric port: '8001/one-de-vpn.pac'
Website http://data.pr4e.org/romeo.txt is accessible via the proxy (when using browser).

Please advise.

Thank you.


RE: Open URL via proxy - perseus142 - Jun-17-2020

please disregard and delete the topic.


RE: Open URL via proxy - micseydel - Jun-18-2020

We don't delete topics, but if you found the solution we'd appreciate you sharing it with us here in case someone finds it helpful in the future.