Hello Python experts,
when trying to execute this little code snippet
import urllib.request
scarlet_pimpernel_link = r'https://gutenberg.org/cache/epub/60/pg60.txt'
count = 0
with urllib.request.urlopen(scarlet_pimpernel_link) as ip_file:
for line in ip_file:
count += 1
print(count)
there is an exception message, something like:
request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 406: Not Acceptable
What could e the problem? I have checked and the url link is valid.
Did you try catching the exception and checking the
reason
field (docs for
HTTPError
are
here)? It's not clear just from the status code (the 406) why the request was bad, so I imagine the reason will give you more info.
Yes, I did try that with the following code:
import urllib.request
scarlet_pimpernel_link = r'https://gutenberg.org/cache/epub/60/pg60.txt'
count = 0
try:
with urllib.request.urlopen(scarlet_pimpernel_link) as ip_file:
for line in ip_file:
count += 1
print(count)
except urllib.error.HTTPError as e:
print(e.reason)
The output is: "Not Acceptable"
I have found out in the documentation the following description:
406: ('Not Acceptable', 'URI not available in preferred format.'),
However, the following code seems to work properly:
import requests
scarlet_pimpernel_link = r'https://gutenberg.org/cache/epub/60/pg60.txt'
response = requests.get(scarlet_pimpernel_link)
count = 0
for line in response.iter_lines():
count += 1
print(count)
So I'm importing another module here. How to find out what is the problem?
Can you execute the first code snippet that raises the exception?
Use always
Requests and not urllib,then avoid error like this.
It also better to downloads the whole file in binary,than doing line be line in requests from a website.
Split may different to as it will be
\r\n
and not only
\n
if download as binary and open file.
import requests
scarlet_pimpernel_link = 'https://gutenberg.org/cache/epub/60/pg60.txt'
response = requests.get(scarlet_pimpernel_link)
with open('pg60.txt', 'wb') as f_web:
f_web.write(response.content)
with open('pg60.txt', encoding='utf-8') as f:
for line in f:
print(line)
#print(repr(line)) # See all,eg new lines
(Oct-25-2020, 04:22 PM)snippsat Wrote: [ -> ]Use always Requests and not urllib,then avoid error like this.
It also better to downloads the whole file in binary,than doing line be line in requests from a website.
Split may different to as it will be \r\n
and not only \n
if download as binary and open file.
import requests
scarlet_pimpernel_link = 'https://gutenberg.org/cache/epub/60/pg60.txt'
response = requests.get(scarlet_pimpernel_link)
with open('pg60.txt', 'wb') as f_web:
f_web.write(response.content)
with open('pg60.txt', encoding='utf-8') as f:
for line in f:
print(line)
#print(repr(line)) # See all,eg new lines
Hello snippsat,
There is a slightly different number of lines if I compare your solution with mine (using requests).
The actual problem is that I'm using the code snippet from the ebook and it should work.
The book can be seen here:
https://www.dbooks.org/python-regex-1590/
Exercise is on the page 17.
I'd like just to add that this program works without any exception:
import urllib.request
sp_link = r'https://www.kernel.org/doc/readme/drivers-staging-lustre-README.txt'
count = 0
try:
with urllib.request.urlopen(sp_link) as ip_file:
for line in ip_file:
count += 1
print(count)
except urllib.error.HTTPError as e:
print(e.reason)
I cannot understand what is the problem. It must be something in the URL address.