Jan-22-2020, 04:54 PM
(Jan-22-2020, 11:32 AM)Brompy Wrote: But when I try to use it does download a .png, but it is an unreadable .png file that is only 13kb large. What is going wrong?Not all days have images,so it will be 13kb on these days.
Setup is:
E:\div_code\home\ |-- roman.py |-- roman_convertIf i change
roman.py
to download more images at once.Also make it so it show days with no image in both int and roman numb.
So here download 25 days,4 days has no image as show in list under.
[('CCCXLVI.png', 346), ('CCCXLVII.png', 347), ('CCCLVI.png', 356), ('CCCLXIII.png', 363)]
# roman.py import requests import os, re from roman_convert import roman_to_int, int_to_roman def make_url(roman_number): return f'https://swordscomic.com/comic/{roman_number}/' no_image = [] def download(url, img_nr): img = requests.get(url) img_name = f'{int_to_roman(img_nr)}.png' with open(img_name, 'wb') as f_out: if len(img.content) < 15000: no_image.append(img_name) else: f_out.write(img.content) if __name__ == '__main__': #img_nr = 364 for img_nr in range(340, 365): url = f'https://swordscomic.com/media/Swords{img_nr}t.png' roman_number = int_to_roman(img_nr) org_link = make_url(roman_number) print(f'Dowloading --> {org_link}') download(url, img_nr) day_int = [roman_to_int(ro.split('.')[0]) for ro in no_image] print(list(zip(no_image, day_int)))Run:
Output:E:\div_code\home
λ python roman.py
Dowloading --> https://swordscomic.com/comic/CCCXL/
Dowloading --> https://swordscomic.com/comic/CCCXLI/
Dowloading --> https://swordscomic.com/comic/CCCXLII/
Dowloading --> https://swordscomic.com/comic/CCCXLIII/
Dowloading --> https://swordscomic.com/comic/CCCXLIV/
Dowloading --> https://swordscomic.com/comic/CCCXLV/
Dowloading --> https://swordscomic.com/comic/CCCXLVI/
Dowloading --> https://swordscomic.com/comic/CCCXLVII/
Dowloading --> https://swordscomic.com/comic/CCCXLVIII/
Dowloading --> https://swordscomic.com/comic/CCCXLIX/
Dowloading --> https://swordscomic.com/comic/CCCL/
Dowloading --> https://swordscomic.com/comic/CCCLI/
Dowloading --> https://swordscomic.com/comic/CCCLII/
Dowloading --> https://swordscomic.com/comic/CCCLIII/
Dowloading --> https://swordscomic.com/comic/CCCLIV/
Dowloading --> https://swordscomic.com/comic/CCCLV/
Dowloading --> https://swordscomic.com/comic/CCCLVI/
Dowloading --> https://swordscomic.com/comic/CCCLVII/
Dowloading --> https://swordscomic.com/comic/CCCLVIII/
Dowloading --> https://swordscomic.com/comic/CCCLIX/
Dowloading --> https://swordscomic.com/comic/CCCLX/
Dowloading --> https://swordscomic.com/comic/CCCLXI/
Dowloading --> https://swordscomic.com/comic/CCCLXII/
Dowloading --> https://swordscomic.com/comic/CCCLXIII/
Dowloading --> https://swordscomic.com/comic/CCCLXIV/
[('CCCXLVI.png', 346), ('CCCXLVII.png', 347), ('CCCLVI.png', 356), ('CCCLXIII.png', 363)]
Can show a quick test with Selenium,as this is maybe not so easy if new to this.
Here i go back 3 times,then send site source code to BS,so can to find the real download(it's not the roman numerals link) link in meta tag.
To download i use same function with some modifications.
from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import time, os import requests def download(img_url): img = requests.get(img_url) img_name = os.path.basename(img_url) with open(img_name, 'wb') as f_out: if len(img.content) < 15000: no_image.append(img_name) else: f_out.write(img.content) if __name__ == '__main__': #--| Setup chrome_options = Options() #chrome_options.add_argument("--headless") browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe') #--| Parse or automation browser.get('https://swordscomic.com/comic/CCCLXV/') back = browser.find_elements_by_css_selector('#navigation-previous')[0].click() time.sleep(3) back = browser.find_elements_by_css_selector('#navigation-previous')[0].click() time.sleep(3) back = browser.find_elements_by_css_selector('#navigation-previous')[0].click() # Give source code to BeautifulSoup soup = BeautifulSoup(browser.page_source, 'html.parser') img_url = soup.find('meta', property="og:image") img_url = img_url.attrs['content'] download(img_url) browser.quit()