HTTPError: Forbidden when try download image

HTTPError: Forbidden when try download image - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: HTTPError: Forbidden when try download image (/thread-1690.html)

HTTPError: Forbidden when try download image - b33g33 - Jan-20-2017

i want to download picture on wallhaven.cc and i can get picture url but image is not download its give an error ;

my code is ;

import urllib.request
from bs4 import BeautifulSoup

imdbUrl="htt"+"ps:"+"//alpha.wallhaven.cc"+"/random?page=4"
r=requests.get(imdbUrl)

soup=BeautifulSoup(r.content,"html.parser")

kelimeler=soup.find_all("img",{"class":"lazyload"})

say=0
for i in kelimeler:
    say +=1
    url=str(i['data-src'])
    url=url.replace("alpha","wallpapers")
    url=url.replace("/thumb/small/th-","/full/wallhaven-")
    url=url.replace("https","http")
    yeniad=str(say)+".jpg"
    url=url.strip()
    print(url)
    urllib.request.urlretrieve(url,yeniad)

but its give an error like this ;

Error:
HTTPError: Forbidden

RE: HTTPError: Forbidden when try download image - Larz60+ - Jan-20-2017

what gets printed from:

print(url)

what the heck is this?

imdbUrl="htt"+"ps:"+"//alpha.wallhaven.cc"+"/random?page=4"

why not

imdbUrl = 'https://alpha.wallhaven.cc/random?page=4'

RE: HTTPError: Forbidden when try download image - wavic - Jan-20-2017

According to documentation there is no urllib.request.get() method. I didn't find it. There is urllib.request.urlopen()
It's better to use Requests

RE: HTTPError: Forbidden when try download image - b33g33 - Jan-20-2017

(Jan-20-2017, 07:55 PM)Larz60+ Wrote: what gets printed from: its show image URL
print(url)
what the heck is this? i use this because i cant open thread when its link , i use this like seening your write
imdbUrl="htt"+"ps:"+"//alpha.wallhaven.cc"+"/random?page=4"
why not
imdbUrl = 'https://alpha.wallhaven.cc/random?page=4'

(Jan-20-2017, 08:43 PM)wavic Wrote: According to documentation there is no urllib.request.get() method. I didn't find it. There is urllib.request.urlopen()
It's better to use Requests

please write it for me , i cant understand anything

RE: HTTPError: Forbidden when try download image - snippsat - Jan-20-2017

They are blocking urllib,but it work with Requests(as you should use anyway).

Quote:please write it for me , i cant understand anything

You should try yourself,but to be nice here how to download 1 image.
You always do test like this,before you are making a loop Undecided

from bs4 import BeautifulSoup
import requests
import os

page = 4
url = 'https://alpha.wallhaven.cc/random?page={}'.format(page)
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
# Parse
kelimeler = soup.find("img", {"class":"lazyload"})
img_nr = os.path.basename(kelimeler['data-src'])
img_nr = img_nr.split('-')[-1]
img_large = 'https://wallpapers.wallhaven.cc/wallpapers/full/wallhaven-{}'.format(img_nr)
# Download
down_link = requests.get(img_large)
with open(img_nr, "wb") as img_obj:
    img_obj.write(down_link.content)

RE: HTTPError: Forbidden when try download image - scriptso - Jan-21-2017

Hey there! Im about to try your script out but Im almost 100% sure I know whats going on. I have to admit that I gave bs a once over years ago and have been married to scrapy (we have a special connection... ?lol)... BUT when your doing your parsing, in scrapys case (as beautifulSoup) theres' a default header or "User Agent" Profile.. Hmmmm .. Cant be much different...

Just google "adding user agent header to beatifulsoup" and tada! but... I

# After import what you need.... You can either list multiple header profiles...
user_agents = [
    'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11)
    'Gecko/20071127 Firefox/2.0.0.11',
    'Opera/9.25 (Windows NT 5.1; U; en)',
    'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)',
    'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)',
    'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.142 Safari/535.19',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20100101 Firefox/11.0',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1',
    'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.151 Safari/535.19'
]

#The when Calling the start or base url you pass the "headers = "... usining choice randomize you can have this list and be...well not sneaky because unless your proxifying its not neccesary..

# for each url entry of a row in the text file get 
# lead info from yelp related to that url...

for dat in linksandsuch:
version = choice(user_agents)
headers = { 'User-Agent' : version }


##### What I would do? Just a single agent defined by header value..
#...
#for dat in linksandsuch:
#headers = {  'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)' }
#    
#
....

If Im wrong shoot me a message yes? Im having issue with scrapys image download function (specifically the renaming of the image not the dl) and I can script something real quick for ya... bu teach a man to fish right? lol

Wait... I'm noticing your download method... are you writing the image?

One google search an 30 seconds later...

In Python 3.x, urllib.request.urlretrieve can be used to download files from any
remote URL:

Not sure where you got that download method which Im guessing it works if you writing directly from the url you called it from... here you trying to get the img .... to respond like it was a page... but forbidden? w.e lol Try urlretrive for you download function... google what you must.

----
#Edit Update!

So I went ahead and ran your script... Donloaded on image ... lol but no 505....??? Maybe your IP got blocked ... ??? try adding delays to your script and lower you throttle??.

RE: HTTPError: Forbidden when try download image - wavic - Jan-21-2017

You are missing comma after the first user agent string

RE: HTTPError: Forbidden when try download image - snippsat - Jan-21-2017

@scriptso you write a little messy Wink

You are right that setting user-agent header can solve it for urllib.
But the clear message is that urllib should not be used,when we have Requests.

Can fix urlretrieve() bye using opener.retrieve().
That can take user-agent header.

>>> import urllib.request
>>> img = 'https://wallpapers.wallhaven.cc/wallpapers/full/wallhaven-293122.jpg'
>>> urllib.request.urlretrieve(img, '1.jpg')
Traceback (most recent call last):  
urllib.error.HTTPError: HTTP Error 403: Forbidden

>>> # Fix it
>>> opener = urllib.request.FancyURLopener({}) 
>>> opener.version = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.69 Safari/537.36'
>>> opener.retrieve(img, '1.jpg')
('1.jpg', <http.client.HTTPMessage object at 0x038F2210>)

RE: HTTPError: Forbidden when try download image - scriptso - Jan-21-2017

(Jan-21-2017, 10:03 AM)snippsat Wrote: @scriptso you write a little messy You are right that setting user-agent header can solve it for urllib. But the clear message is that urllib should not be used,when we have Requests. Can fix urlretrieve() bye using opener.retrieve(). That can take user-agent header.
>>> import urllib.request >>> img = 'https://wallpapers.wallhaven.cc/wallpapers/full/wallhaven-293122.jpg' >>> urllib.request.urlretrieve(img, '1.jpg') Traceback (most recent call last):   urllib.error.HTTPError: HTTP Error 403: Forbidden >>> # Fix it >>> opener = urllib.request.FancyURLopener({}) >>> opener.version = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.69 Safari/537.36' >>> opener.retrieve(img, '1.jpg') ('1.jpg', <http.client.HTTPMessage object at 0x038F2210>)

LMAO ! I get that a lot =( .... product of insomnia + scatter brain ... good stuff! I totally mixed up your fix and the original poster... maybe it is time for sleep X_x I 'durped' that up. *sigh*