HTTPError: Forbidden when try download image

b33g33 · (This post was last modified: Jan-20-2017, 04:54 PM by b33g33.)

i want to download picture on wallhaven.cc and i can get picture url but image is not download its give an error ;

my code is ;

import urllib.request
from bs4 import BeautifulSoup

imdbUrl="htt"+"ps:"+"//alpha.wallhaven.cc"+"/random?page=4"
r=requests.get(imdbUrl)

soup=BeautifulSoup(r.content,"html.parser")

kelimeler=soup.find_all("img",{"class":"lazyload"})

say=0
for i in kelimeler:
    say +=1
    url=str(i['data-src'])
    url=url.replace("alpha","wallpapers")
    url=url.replace("/thumb/small/th-","/full/wallhaven-")
    url=url.replace("https","http")
    yeniad=str(say)+".jpg"
    url=url.strip()
    print(url)
    urllib.request.urlretrieve(url,yeniad)

but its give an error like this ;

Error:
HTTPError: Forbidden

**Larz60+** · Jan-20-2017, 07:55 PM

what gets printed from:

print(url)

what the heck is this?

imdbUrl="htt"+"ps:"+"//alpha.wallhaven.cc"+"/random?page=4"

why not

imdbUrl = 'https://alpha.wallhaven.cc/random?page=4'

wavic · (This post was last modified: Jan-20-2017, 08:43 PM by wavic.)

According to documentation there is no urllib.request.get() method. I didn't find it. There is urllib.request.urlopen()
It's better to use Requests

b33g33 · (This post was last modified: Jan-20-2017, 10:59 PM by b33g33.)

(Jan-20-2017, 07:55 PM)Larz60+ Wrote: what gets printed from: its show image URL
print(url)
what the heck is this? i use this because i cant open thread when its link , i use this like seening your write
imdbUrl="htt"+"ps:"+"//alpha.wallhaven.cc"+"/random?page=4"
why not
imdbUrl = 'https://alpha.wallhaven.cc/random?page=4'

(Jan-20-2017, 08:43 PM)wavic Wrote: According to documentation there is no urllib.request.get() method. I didn't find it. There is urllib.request.urlopen()
It's better to use Requests

please write it for me , i cant understand anything

***snippsat*** · (This post was last modified: Jan-20-2017, 11:36 PM by snippsat.)

They are blocking urllib,but it work with Requests(as you should use anyway).

Quote:please write it for me , i cant understand anything

You should try yourself,but to be nice here how to download 1 image.
You always do test like this,before you are making a loop Undecided

from bs4 import BeautifulSoup
import requests
import os

page = 4
url = 'https://alpha.wallhaven.cc/random?page={}'.format(page)
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
# Parse
kelimeler = soup.find("img", {"class":"lazyload"})
img_nr = os.path.basename(kelimeler['data-src'])
img_nr = img_nr.split('-')[-1]
img_large = 'https://wallpapers.wallhaven.cc/wallpapers/full/wallhaven-{}'.format(img_nr)
# Download
down_link = requests.get(img_large)
with open(img_nr, "wb") as img_obj:
    img_obj.write(down_link.content)

scriptso · (This post was last modified: Jan-21-2017, 09:35 AM by scriptso.)

Hey there! Im about to try your script out but Im almost 100% sure I know whats going on. I have to admit that I gave bs a once over years ago and have been married to scrapy (we have a special connection... ?lol)... BUT when your doing your parsing, in scrapys case (as beautifulSoup) theres' a default header or "User Agent" Profile.. Hmmmm .. Cant be much different...

Just google "adding user agent header to beatifulsoup" and tada! but... I

# After import what you need.... You can either list multiple header profiles...
user_agents = [
    'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11)
    'Gecko/20071127 Firefox/2.0.0.11',
    'Opera/9.25 (Windows NT 5.1; U; en)',
    'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)',
    'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)',
    'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.142 Safari/535.19',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20100101 Firefox/11.0',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1',
    'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.151 Safari/535.19'
]

#The when Calling the start or base url you pass the "headers = "... usining choice randomize you can have this list and be...well not sneaky because unless your proxifying its not neccesary..

# for each url entry of a row in the text file get 
# lead info from yelp related to that url...

for dat in linksandsuch:
version = choice(user_agents)
headers = { 'User-Agent' : version }


##### What I would do? Just a single agent defined by header value..
#...
#for dat in linksandsuch:
#headers = {  'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)' }
#    
#
....

If Im wrong shoot me a message yes? Im having issue with scrapys image download function (specifically the renaming of the image not the dl) and I can script something real quick for ya... bu teach a man to fish right? lol

Wait... I'm noticing your download method... are you writing the image?

One google search an 30 seconds later...

In Python 3.x, urllib.request.urlretrieve can be used to download files from any
remote URL:

Not sure where you got that download method which Im guessing it works if you writing directly from the url you called it from... here you trying to get the img .... to respond like it was a page... but forbidden? w.e lol Try urlretrive for you download function... google what you must.

----
#Edit Update!

So I went ahead and ran your script... Donloaded on image ... lol but no 505....??? Maybe your IP got blocked ... ??? try adding delays to your script and lower you throttle??.

wavic · Jan-21-2017, 09:13 AM

You are missing comma after the first user agent string

***snippsat*** · Jan-21-2017, 10:03 AM

@scriptso you write a little messy Wink

You are right that setting user-agent header can solve it for urllib.
But the clear message is that urllib should not be used,when we have Requests.

Can fix urlretrieve() bye using opener.retrieve().
That can take user-agent header.

>>> import urllib.request
>>> img = 'https://wallpapers.wallhaven.cc/wallpapers/full/wallhaven-293122.jpg'
>>> urllib.request.urlretrieve(img, '1.jpg')
Traceback (most recent call last):  
urllib.error.HTTPError: HTTP Error 403: Forbidden

>>> # Fix it
>>> opener = urllib.request.FancyURLopener({}) 
>>> opener.version = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.69 Safari/537.36'
>>> opener.retrieve(img, '1.jpg')
('1.jpg', <http.client.HTTPMessage object at 0x038F2210>)

scriptso · Jan-21-2017, 12:42 PM

(Jan-21-2017, 10:03 AM)snippsat Wrote: @scriptso you write a little messy You are right that setting user-agent header can solve it for urllib. But the clear message is that urllib should not be used,when we have Requests. Can fix urlretrieve() bye using opener.retrieve(). That can take user-agent header.
>>> import urllib.request >>> img = 'https://wallpapers.wallhaven.cc/wallpapers/full/wallhaven-293122.jpg' >>> urllib.request.urlretrieve(img, '1.jpg') Traceback (most recent call last):   urllib.error.HTTPError: HTTP Error 403: Forbidden >>> # Fix it >>> opener = urllib.request.FancyURLopener({}) >>> opener.version = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.69 Safari/537.36' >>> opener.retrieve(img, '1.jpg') ('1.jpg', <http.client.HTTPMessage object at 0x038F2210>)

LMAO ! I get that a lot =( .... product of insomnia + scatter brain ... good stuff! I totally mixed up your fix and the original poster... maybe it is time for sleep X_x I 'durped' that up. *sigh*

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	403 Forbidden Error	Evil_Patrick	1	4,590	Jun-20-2020, 02:19 PM Last Post: snippsat
	urllib.error.HTTPError: HTTP Error 404: Not Found	ckkkkk	4	8,966	Mar-03-2020, 11:30 AM Last Post: snippsat

HTTPError: Forbidden when try download image

User Panel Messages

Announcements