Can not make this image downloader work - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Can not make this image downloader work (/thread-27800.html) |
Can not make this image downloader work - Blue Dog - Jun-22-2020 Hi, I am trying to make a maps down loader. I got everything working good, but for the last for loop. I been working on this for a lone time. here id the code: import requests import bs4 as bs import urllib.request url = str(input('URL: ')) opener = urllib.request.build_opener() opener.add_headers = [{'User-Agent' : 'Mozilla'}] urllib.request.install_opener(opener) raw = requests.get(url).text soup = bs.BeautifulSoup(raw, 'html.parser') imgs = soup.find_all ('img') links = [] for img in imgs: link = img.get('scr') #if 'http://' not in link: #link = url + link links.append(link) print('Image detected: ' + str(len(links))) for i in range(len(links)): filename = str(img.jpg) .format(i) urllib.request.urlretrieve(links[i], filename) print('Done!')here is the error: URL: http://legacy.lib.utexas.edu/maps/topo/indiana/ Image detected: 35 Traceback (most recent call last): File "C:\Users\Kite\Desktop\scraping Images\TUT_7\test_1.py", line 25, in <module> urllib.request.urlretrieve(links[i], filename) File "C:\Python36\lib\urllib\request.py", line 246, in urlretrieve url_type, path = splittype(url) File "C:\Python36\lib\urllib\parse.py", line 954, in splittype match = _typeprog.match(url) TypeError: expected string or bytes-like object It looks like I need to turn something to a string. If anyone can give me an hand that would be nice. RE: Can not make this image downloader work - snippsat - Jun-22-2020 There are several problems here,so not even close to work Before writing more code most test that what you get back is acutely usable. print() always work as fast test,or here i use pprint() then is easier to look at content. import requests import bs4 as bs import urllib.request from pprint import pprint url = 'http://legacy.lib.utexas.edu/maps/topo/indiana/' opener = urllib.request.build_opener() opener.add_headers = [{'User-Agent' : 'Mozilla'}] urllib.request.install_opener(opener) raw = requests.get(url).text soup = bs.BeautifulSoup(raw, 'html.parser') imgs = soup.find_all ('img') pprint(imgs) As see this is not images links of maps that you want.So can help write the start as this is not usable. I gone trow awyay urllib as that should not be used anyway.import requests from bs4 import BeautifulSoup url = 'http://legacy.lib.utexas.edu/maps/topo/indiana/' response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') maps = soup.select_one('#actualcontent > ul') map_link = maps.find_all('a') for link in map_link: print(link.get('href')) So now can try to figure out how to download these image links,and you do not need to import urllib for this.
RE: Can not make this image downloader work - Blue Dog - Jun-22-2020 Thank you snippsat. Be for I start working on the download part, I want to understand the code you gave me. I will be back RE: Can not make this image downloader work - Blue Dog - Jun-23-2020 Ok, I been working on this half the night. Just can't get it to work. I am just not sure how to download all the maps. Their will be many files, I can download one map but that is it. here is what I been working with. the last of many ways. import requests from bs4 import BeautifulSoup url = 'http://legacy.lib.utexas.edu/maps/topo/indiana/' response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') maps = soup.select_one('#actualcontent > ul') map_link = maps.find_all('a') for link in map_link: print(link.get('href')) for map in maps: with open("maps.jpg", "wb") as file: file.write(response.content) file.closeI have the script in its own folder so the maps should be saved in the folder that the script is in. Maybe a while loop will work better. I am just lost on saving the images. RE: Can not make this image downloader work - snippsat - Jun-23-2020 (Jun-23-2020, 11:50 AM)Blue Dog Wrote: I have the script in its own folder so the maps should be saved in the folder that the script is in. Maybe a while loop will work better. I am just lost on saving the images.The loop is already done in my code,so inside this loop can use os.path.basename to get correct names of images when save.Then to get the content(bytes) of images need also to open links with Requests,then can save. Here also put in a progress bar with tqdm. import requests from bs4 import BeautifulSoup import os # pip install tqdm from tqdm import tqdm url = 'http://legacy.lib.utexas.edu/maps/topo/indiana/' response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') maps = soup.select_one('#actualcontent > ul') map_link = maps.find_all('a')[:-1] for link in tqdm(map_link): img_name = os.path.basename(link.get('href')) #print(img_name) img = requests.get(link.get('href')) with open(img_name, 'wb') as f_out: f_out.write(img.content) RE: Can not make this image downloader work - Blue Dog - Jun-23-2020 WoW, works great. I did not have to install tqdm, so it must have been installed. I downloaded the Doc for os.(os.path.basename) Return the base name of pathname path. This is the second element of the pair returned by passing path to the function split(). what does that mean? ***************************************** img = requests.get(link.get('href')) this is a get request for all links with 'href' **************************************************** with open(img_name, 'wb') as f_out: I think this open a file that you can put Img_name in. ************************************************************** f_out.write(img.content) This writ the img to the file. If I am wrong on any of the line let me know. I see how you name the file to be download. that was one of the big problem I had, I was think each file needed a new name. Thank you so much, I do a lot of metal detecting and I been making small program to help me get stuff off the net. snippsat, I just downloaded your tut on scraping. Will read it tonight. Thanks RE: Can not make this image downloader work - snippsat - Jun-23-2020 (Jun-23-2020, 07:12 PM)Blue Dog Wrote: I downloaded the Doc for os.(os.path.basename) Return the base name of pathname path. This is the second element of the pair returned by passing path to the function split(). what does that mean?It helps to use interactive shell to test stuff like this,a better REPL like ptpython or IPython also helps. >>> import os >>> help(os.path.basename) Help on function basename in module ntpath: basename(p) Returns the final component of a pathname >>> url = 'http://legacy.lib.utexas.edu/maps/topo/indexes/txu-pclmaps-topo-in-index-1925.jpg' >>> os.path.basename(url) 'txu-pclmaps-topo-in-index-1925.jpg'So it's a simple functionality,it's not hard to write this. >>> url = 'http://legacy.lib.utexas.edu/maps/topo/indexes/txu-pclmaps-topo-in-index-1925.jpg' >>> url.split('/')[-1] 'txu-pclmaps-topo-in-index-1925.jpg' Blue Dog Wrote:img = requests.get(link.get('href'))No the links is already found with find_all('a') href is to get bare image link out of image link tag found.>>> map_link[0] <a href="http://legacy.lib.utexas.edu/maps/topo/indexes/txu-pclmaps-topo-in-index-1925.jpg">Indiana - Topographic Map Index 1925</a> >>> map_link[0].get('href') 'http://legacy.lib.utexas.edu/maps/topo/indexes/txu-pclmaps-topo-in-index-1925.jpg' Blue Dog Wrote:with open(img_name, 'wb') as f_out: |