Jun-02-2020, 04:52 PM
Hello team,
I am newbie and I am looking for help.
I would like to access a webpage and retrieve data - sound easy, right ?
However I have to use proxy. Here I am stuck :(
I have already tried some google help or stackowerflow advises, but I am still getting errors.
For example: https://stackoverflow.com/questions/3457...st-python3
My code #1:
Please advise.
Thank you.
I am newbie and I am looking for help.
I would like to access a webpage and retrieve data - sound easy, right ?
However I have to use proxy. Here I am stuck :(
I have already tried some google help or stackowerflow advises, but I am still getting errors.
For example: https://stackoverflow.com/questions/3457...st-python3
My code #1:
import urllib.request as request proxy_handler = request.ProxyHandler({'http': '<proxy omitted>'}) opener = request.build_opener(proxy_handler) url = 'http://data.pr4e.org/romeo.txt' # open the website with the opener req = opener.open(url) data = req.read().decode('utf8') print(data)Error:
Error:PS C:\Users\<user>\Desktop> python .\week4.py
Traceback (most recent call last):
File ".\week4.py", line 33, in <module>
req = opener.open(url)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
My code #2:from urllib import request as urlrequest proxy_host = '<proxy omitted>' # host and port of your proxy url = 'http://data.pr4e.org/romeo.txt' req = urlrequest.Request(url) req.set_proxy(proxy_host, 'http') response = urlrequest.urlopen(req) print(response.read().decode('utf8'))Error:
Error:PS C:\Users\<user>\Desktop> python .\week4.py
Traceback (most recent call last):
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 865, in _get_hostport
port = int(host[i+1:])
ValueError: invalid literal for int() with base 10: '8001/one-de-vpn.pac'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".\week4.py", line 29, in <module>
response = urlrequest.urlopen(req)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 525, in open
response = self._open(req, data)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 1348, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 1288, in do_open
h = http_class(host, timeout=req.timeout, **http_conn_args)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 829, in __init__
(self.host, self.port) = self._get_hostport(host, port)
File "C:\Users\<user>\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 870, in _get_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
http.client.InvalidURL: nonnumeric port: '8001/one-de-vpn.pac'
Website http://data.pr4e.org/romeo.txt is accessible via the proxy (when using browser).Please advise.
Thank you.