HTTPError: Forbidden when try download image - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: HTTPError: Forbidden when try download image (/thread-1690.html) |
HTTPError: Forbidden when try download image - b33g33 - Jan-20-2017 i want to download picture on wallhaven.cc and i can get picture url but image is not download its give an error ; my code is ; import urllib.request from bs4 import BeautifulSoup imdbUrl="htt"+"ps:"+"//alpha.wallhaven.cc"+"/random?page=4" r=requests.get(imdbUrl) soup=BeautifulSoup(r.content,"html.parser") kelimeler=soup.find_all("img",{"class":"lazyload"}) say=0 for i in kelimeler: say +=1 url=str(i['data-src']) url=url.replace("alpha","wallpapers") url=url.replace("/thumb/small/th-","/full/wallhaven-") url=url.replace("https","http") yeniad=str(say)+".jpg" url=url.strip() print(url) urllib.request.urlretrieve(url,yeniad)but its give an error like this ;
RE: HTTPError: Forbidden when try download image - Larz60+ - Jan-20-2017 what gets printed from: print(url)what the heck is this? imdbUrl="htt"+"ps:"+"//alpha.wallhaven.cc"+"/random?page=4"why not imdbUrl = 'https://alpha.wallhaven.cc/random?page=4' RE: HTTPError: Forbidden when try download image - wavic - Jan-20-2017 According to documentation there is no urllib.request.get() method. I didn't find it. There is urllib.request.urlopen() It's better to use Requests RE: HTTPError: Forbidden when try download image - b33g33 - Jan-20-2017 (Jan-20-2017, 07:55 PM)Larz60+ Wrote: what gets printed from: its show image URL (Jan-20-2017, 08:43 PM)wavic Wrote: According to documentation there is no urllib.request.get() method. I didn't find it. There is urllib.request.urlopen() please write it for me , i cant understand anything RE: HTTPError: Forbidden when try download image - snippsat - Jan-20-2017 They are blocking urllib,but it work with Requests(as you should use anyway). Quote:please write it for me , i cant understand anythingYou should try yourself,but to be nice here how to download 1 image. You always do test like this,before you are making a loop from bs4 import BeautifulSoup import requests import os page = 4 url = 'https://alpha.wallhaven.cc/random?page={}'.format(page) url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'html.parser') # Parse kelimeler = soup.find("img", {"class":"lazyload"}) img_nr = os.path.basename(kelimeler['data-src']) img_nr = img_nr.split('-')[-1] img_large = 'https://wallpapers.wallhaven.cc/wallpapers/full/wallhaven-{}'.format(img_nr) # Download down_link = requests.get(img_large) with open(img_nr, "wb") as img_obj: img_obj.write(down_link.content) RE: HTTPError: Forbidden when try download image - scriptso - Jan-21-2017 Hey there! Im about to try your script out but Im almost 100% sure I know whats going on. I have to admit that I gave bs a once over years ago and have been married to scrapy (we have a special connection... ?lol)... BUT when your doing your parsing, in scrapys case (as beautifulSoup) theres' a default header or "User Agent" Profile.. Hmmmm .. Cant be much different... Just google "adding user agent header to beatifulsoup" and tada! but... I # After import what you need.... You can either list multiple header profiles... user_agents = [ 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) 'Gecko/20071127 Firefox/2.0.0.11', 'Opera/9.25 (Windows NT 5.1; U; en)', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)', 'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)', 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.142 Safari/535.19', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20100101 Firefox/11.0', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.151 Safari/535.19' ] #The when Calling the start or base url you pass the "headers = "... usining choice randomize you can have this list and be...well not sneaky because unless your proxifying its not neccesary.. # for each url entry of a row in the text file get # lead info from yelp related to that url... for dat in linksandsuch: version = choice(user_agents) headers = { 'User-Agent' : version } ##### What I would do? Just a single agent defined by header value.. #... #for dat in linksandsuch: #headers = { 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)' } # # ....If Im wrong shoot me a message yes? Im having issue with scrapys image download function (specifically the renaming of the image not the dl) and I can script something real quick for ya... bu teach a man to fish right? lol Wait... I'm noticing your download method... are you writing the image? One google search an 30 seconds later... In Python 3.x, urllib.request.urlretrieve can be used to download files from any remote URL: Not sure where you got that download method which Im guessing it works if you writing directly from the url you called it from... here you trying to get the img .... to respond like it was a page... but forbidden? w.e lol Try urlretrive for you download function... google what you must. ---- #Edit Update! So I went ahead and ran your script... Donloaded on image ... lol but no 505....??? Maybe your IP got blocked ... ??? try adding delays to your script and lower you throttle??. RE: HTTPError: Forbidden when try download image - wavic - Jan-21-2017 You are missing comma after the first user agent string RE: HTTPError: Forbidden when try download image - snippsat - Jan-21-2017 @scriptso you write a little messy You are right that setting user-agent header can solve it for urllib. But the clear message is that urllib should not be used,when we have Requests. Can fix urlretrieve() bye using opener.retrieve(). That can take user-agent header. >>> import urllib.request >>> img = 'https://wallpapers.wallhaven.cc/wallpapers/full/wallhaven-293122.jpg' >>> urllib.request.urlretrieve(img, '1.jpg') Traceback (most recent call last): urllib.error.HTTPError: HTTP Error 403: Forbidden >>> # Fix it >>> opener = urllib.request.FancyURLopener({}) >>> opener.version = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.69 Safari/537.36' >>> opener.retrieve(img, '1.jpg') ('1.jpg', <http.client.HTTPMessage object at 0x038F2210>) RE: HTTPError: Forbidden when try download image - scriptso - Jan-21-2017 (Jan-21-2017, 10:03 AM)snippsat Wrote: @scriptso you write a little messy You are right that setting user-agent header can solve it for urllib. But the clear message is that urllib should not be used,when we have Requests. Can fix urlretrieve() bye using opener.retrieve(). That can take user-agent header.>>> import urllib.request >>> img = 'https://wallpapers.wallhaven.cc/wallpapers/full/wallhaven-293122.jpg' >>> urllib.request.urlretrieve(img, '1.jpg') Traceback (most recent call last): urllib.error.HTTPError: HTTP Error 403: Forbidden >>> # Fix it >>> opener = urllib.request.FancyURLopener({}) >>> opener.version = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.69 Safari/537.36' >>> opener.retrieve(img, '1.jpg') ('1.jpg', <http.client.HTTPMessage object at 0x038F2210>) LMAO ! I get that a lot =( .... product of insomnia + scatter brain ... good stuff! I totally mixed up your fix and the original poster... maybe it is time for sleep X_x I 'durped' that up. *sigh* |