Python Forum
Cannot open url link using urllib.request
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Cannot open url link using urllib.request
#1
Hello Python experts,

when trying to execute this little code snippet
import urllib.request

scarlet_pimpernel_link = r'https://gutenberg.org/cache/epub/60/pg60.txt'
count = 0
with urllib.request.urlopen(scarlet_pimpernel_link) as ip_file:
    for line in ip_file:
        count += 1
print(count)
there is an exception message, something like:
request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 406: Not Acceptable

What could e the problem? I have checked and the url link is valid.
Reply
#2
Did you try catching the exception and checking the reason field (docs for HTTPError are here)? It's not clear just from the status code (the 406) why the request was bad, so I imagine the reason will give you more info.
Reply
#3
Yes, I did try that with the following code:
import urllib.request

scarlet_pimpernel_link = r'https://gutenberg.org/cache/epub/60/pg60.txt'
count = 0
try:
    with urllib.request.urlopen(scarlet_pimpernel_link) as ip_file:
        for line in ip_file:
            count += 1
    print(count)
except urllib.error.HTTPError as e:
    print(e.reason)
The output is: "Not Acceptable"

I have found out in the documentation the following description:
406: ('Not Acceptable', 'URI not available in preferred format.'),

However, the following code seems to work properly:
import requests

scarlet_pimpernel_link = r'https://gutenberg.org/cache/epub/60/pg60.txt'

response = requests.get(scarlet_pimpernel_link)
count = 0
for line in response.iter_lines():
    count += 1
print(count)
So I'm importing another module here. How to find out what is the problem?

Can you execute the first code snippet that raises the exception?
Reply
#4
Use always Requests and not urllib,then avoid error like this.
It also better to downloads the whole file in binary,than doing line be line in requests from a website.
Split may different to as it will be \r\n and not only \n if download as binary and open file.
import requests

scarlet_pimpernel_link = 'https://gutenberg.org/cache/epub/60/pg60.txt'
response = requests.get(scarlet_pimpernel_link)
with open('pg60.txt', 'wb') as f_web:
    f_web.write(response.content)

with open('pg60.txt', encoding='utf-8') as f:
     for line in f:
        print(line)
        #print(repr(line)) # See all,eg new lines
Reply
#5
(Oct-25-2020, 04:22 PM)snippsat Wrote: Use always Requests and not urllib,then avoid error like this.
It also better to downloads the whole file in binary,than doing line be line in requests from a website.
Split may different to as it will be \r\n and not only \n if download as binary and open file.
import requests

scarlet_pimpernel_link = 'https://gutenberg.org/cache/epub/60/pg60.txt'
response = requests.get(scarlet_pimpernel_link)
with open('pg60.txt', 'wb') as f_web:
    f_web.write(response.content)

with open('pg60.txt', encoding='utf-8') as f:
     for line in f:
        print(line)
        #print(repr(line)) # See all,eg new lines

Hello snippsat,
There is a slightly different number of lines if I compare your solution with mine (using requests).
The actual problem is that I'm using the code snippet from the ebook and it should work.
The book can be seen here:
https://www.dbooks.org/python-regex-1590/
Exercise is on the page 17.
Reply
#6
I'd like just to add that this program works without any exception:

import urllib.request

sp_link = r'https://www.kernel.org/doc/readme/drivers-staging-lustre-README.txt'
count = 0
try:
    with urllib.request.urlopen(sp_link) as ip_file:
        for line in ip_file:
            count += 1
    print(count)
except urllib.error.HTTPError as e:
    print(e.reason)
I cannot understand what is the problem. It must be something in the URL address.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  urllib can't find "parse" rjdegraff42 6 1,973 Jul-24-2023, 05:28 PM
Last Post: deanhystad
  I can't open a link with Selenium in Python jao 0 1,369 Jan-30-2022, 04:21 AM
Last Post: jao
  how can I correct the Bad Request error on my curl request tomtom 8 4,969 Oct-03-2021, 06:32 AM
Last Post: tomtom
  Prevent urllib.request from using my local proxy spacedog 0 2,804 Apr-24-2021, 08:55 PM
Last Post: spacedog
  urllib.request.ProxyHandler works with bad proxy spacedog 0 5,853 Apr-24-2021, 08:02 AM
Last Post: spacedog
  Need help with XPath using requests,time,urllib.request and BeautifulSoup spacedog 3 2,800 Apr-24-2021, 02:48 AM
Last Post: bowlofred
  Help with urllib.request Brian177 2 2,839 Apr-21-2021, 01:58 PM
Last Post: Brian177
  urllib.request ericmt123 2 2,389 Dec-21-2020, 06:53 PM
Last Post: Larz60+
  urllib is not a package traceback cc26 3 5,295 Aug-28-2020, 09:34 AM
Last Post: snippsat
  ImportError: cannot import name 'Request' from 'request' abhishek81py 1 3,861 Jun-18-2020, 08:07 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020