Python Forum
Cannot open url link using urllib.request - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Cannot open url link using urllib.request (/thread-30547.html)



Cannot open url link using urllib.request - Askic - Oct-25-2020

Hello Python experts,

when trying to execute this little code snippet
import urllib.request

scarlet_pimpernel_link = r'https://gutenberg.org/cache/epub/60/pg60.txt'
count = 0
with urllib.request.urlopen(scarlet_pimpernel_link) as ip_file:
    for line in ip_file:
        count += 1
print(count)
there is an exception message, something like:
request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 406: Not Acceptable

What could e the problem? I have checked and the url link is valid.


RE: Cannot open url link using urllib.request - ndc85430 - Oct-25-2020

Did you try catching the exception and checking the reason field (docs for HTTPError are here)? It's not clear just from the status code (the 406) why the request was bad, so I imagine the reason will give you more info.


RE: Cannot open url link using urllib.request - Askic - Oct-25-2020

Yes, I did try that with the following code:
import urllib.request

scarlet_pimpernel_link = r'https://gutenberg.org/cache/epub/60/pg60.txt'
count = 0
try:
    with urllib.request.urlopen(scarlet_pimpernel_link) as ip_file:
        for line in ip_file:
            count += 1
    print(count)
except urllib.error.HTTPError as e:
    print(e.reason)
The output is: "Not Acceptable"

I have found out in the documentation the following description:
406: ('Not Acceptable', 'URI not available in preferred format.'),

However, the following code seems to work properly:
import requests

scarlet_pimpernel_link = r'https://gutenberg.org/cache/epub/60/pg60.txt'

response = requests.get(scarlet_pimpernel_link)
count = 0
for line in response.iter_lines():
    count += 1
print(count)
So I'm importing another module here. How to find out what is the problem?

Can you execute the first code snippet that raises the exception?


RE: Cannot open url link using urllib.request - snippsat - Oct-25-2020

Use always Requests and not urllib,then avoid error like this.
It also better to downloads the whole file in binary,than doing line be line in requests from a website.
Split may different to as it will be \r\n and not only \n if download as binary and open file.
import requests

scarlet_pimpernel_link = 'https://gutenberg.org/cache/epub/60/pg60.txt'
response = requests.get(scarlet_pimpernel_link)
with open('pg60.txt', 'wb') as f_web:
    f_web.write(response.content)

with open('pg60.txt', encoding='utf-8') as f:
     for line in f:
        print(line)
        #print(repr(line)) # See all,eg new lines



RE: Cannot open url link using urllib.request - Askic - Oct-25-2020

(Oct-25-2020, 04:22 PM)snippsat Wrote: Use always Requests and not urllib,then avoid error like this.
It also better to downloads the whole file in binary,than doing line be line in requests from a website.
Split may different to as it will be \r\n and not only \n if download as binary and open file.
import requests

scarlet_pimpernel_link = 'https://gutenberg.org/cache/epub/60/pg60.txt'
response = requests.get(scarlet_pimpernel_link)
with open('pg60.txt', 'wb') as f_web:
    f_web.write(response.content)

with open('pg60.txt', encoding='utf-8') as f:
     for line in f:
        print(line)
        #print(repr(line)) # See all,eg new lines

Hello snippsat,
There is a slightly different number of lines if I compare your solution with mine (using requests).
The actual problem is that I'm using the code snippet from the ebook and it should work.
The book can be seen here:
https://www.dbooks.org/python-regex-1590/
Exercise is on the page 17.


RE: Cannot open url link using urllib.request - Askic - Oct-25-2020

I'd like just to add that this program works without any exception:

import urllib.request

sp_link = r'https://www.kernel.org/doc/readme/drivers-staging-lustre-README.txt'
count = 0
try:
    with urllib.request.urlopen(sp_link) as ip_file:
        for line in ip_file:
            count += 1
    print(count)
except urllib.error.HTTPError as e:
    print(e.reason)
I cannot understand what is the problem. It must be something in the URL address.