Cannot open url link using urllib.request

Askic · Oct-25-2020, 03:46 PM

Hello Python experts,

when trying to execute this little code snippet

import urllib.request

scarlet_pimpernel_link = r'https://gutenberg.org/cache/epub/60/pg60.txt'
count = 0
with urllib.request.urlopen(scarlet_pimpernel_link) as ip_file:
    for line in ip_file:
        count += 1
print(count)

there is an exception message, something like:
request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 406: Not Acceptable

What could e the problem? I have checked and the url link is valid.

ndc85430 · Oct-25-2020, 03:55 PM

Did you try catching the exception and checking the reason field (docs for HTTPError are here)? It's not clear just from the status code (the 406) why the request was bad, so I imagine the reason will give you more info.

Askic · (This post was last modified: Oct-25-2020, 04:19 PM by Askic.)

Yes, I did try that with the following code:

import urllib.request

scarlet_pimpernel_link = r'https://gutenberg.org/cache/epub/60/pg60.txt'
count = 0
try:
    with urllib.request.urlopen(scarlet_pimpernel_link) as ip_file:
        for line in ip_file:
            count += 1
    print(count)
except urllib.error.HTTPError as e:
    print(e.reason)

The output is: "Not Acceptable"

I have found out in the documentation the following description:
406: ('Not Acceptable', 'URI not available in preferred format.'),

However, the following code seems to work properly:

import requests

scarlet_pimpernel_link = r'https://gutenberg.org/cache/epub/60/pg60.txt'

response = requests.get(scarlet_pimpernel_link)
count = 0
for line in response.iter_lines():
    count += 1
print(count)

So I'm importing another module here. How to find out what is the problem?

Can you execute the first code snippet that raises the exception?

***snippsat*** · (This post was last modified: Oct-25-2020, 04:22 PM by snippsat.)

Use always Requests and not urllib,then avoid error like this.
It also better to downloads the whole file in binary,than doing line be line in requests from a website.
Split may different to as it will be \r\n and not only \n if download as binary and open file.

import requests

scarlet_pimpernel_link = 'https://gutenberg.org/cache/epub/60/pg60.txt'
response = requests.get(scarlet_pimpernel_link)
with open('pg60.txt', 'wb') as f_web:
    f_web.write(response.content)

with open('pg60.txt', encoding='utf-8') as f:
     for line in f:
        print(line)
        #print(repr(line)) # See all,eg new lines

Askic · (This post was last modified: Oct-25-2020, 04:39 PM by Askic.)

(Oct-25-2020, 04:22 PM)snippsat Wrote: Use always Requests and not urllib,then avoid error like this.
It also better to downloads the whole file in binary,than doing line be line in requests from a website.
Split may different to as it will be \r\n and not only \n if download as binary and open file.
import requests

scarlet_pimpernel_link = 'https://gutenberg.org/cache/epub/60/pg60.txt'
response = requests.get(scarlet_pimpernel_link)
with open('pg60.txt', 'wb') as f_web:
    f_web.write(response.content)

with open('pg60.txt', encoding='utf-8') as f:
     for line in f:
        print(line)
        #print(repr(line)) # See all,eg new lines

Hello snippsat,
There is a slightly different number of lines if I compare your solution with mine (using requests).
The actual problem is that I'm using the code snippet from the ebook and it should work.
The book can be seen here:
https://www.dbooks.org/python-regex-1590/
Exercise is on the page 17.

Askic · Oct-25-2020, 04:56 PM

I'd like just to add that this program works without any exception:

import urllib.request

sp_link = r'https://www.kernel.org/doc/readme/drivers-staging-lustre-README.txt'
count = 0
try:
    with urllib.request.urlopen(sp_link) as ip_file:
        for line in ip_file:
            count += 1
    print(count)
except urllib.error.HTTPError as e:
    print(e.reason)

I cannot understand what is the problem. It must be something in the URL address.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	urllib can't find "parse"	rjdegraff42	6	6,222	Jul-24-2023, 05:28 PM Last Post: deanhystad
	I can't open a link with Selenium in Python	jao	0	1,941	Jan-30-2022, 04:21 AM Last Post: jao
	how can I correct the Bad Request error on my curl request	tomtom	8	7,056	Oct-03-2021, 06:32 AM Last Post: tomtom
	Prevent urllib.request from using my local proxy	spacedog	0	3,723	Apr-24-2021, 08:55 PM Last Post: spacedog
	urllib.request.ProxyHandler works with bad proxy	spacedog	0	7,166	Apr-24-2021, 08:02 AM Last Post: spacedog
	Need help with XPath using requests,time,urllib.request and BeautifulSoup	spacedog	3	3,718	Apr-24-2021, 02:48 AM Last Post: bowlofred
	Help with urllib.request	Brian177	2	3,694	Apr-21-2021, 01:58 PM Last Post: Brian177
	urllib.request	ericmt123	2	3,109	Dec-21-2020, 06:53 PM Last Post: Larz60+
	urllib is not a package traceback	cc26	3	7,716	Aug-28-2020, 09:34 AM Last Post: snippsat
	ImportError: cannot import name 'Request' from 'request'	abhishek81py	1	5,095	Jun-18-2020, 08:07 AM Last Post: buran

Cannot open url link using urllib.request

User Panel Messages

Announcements