Python Forum
Downloading Images - Unable to find correct selector
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Downloading Images - Unable to find correct selector
#5
(Jan-22-2020, 11:32 AM)Brompy Wrote: But when I try to use it does download a .png, but it is an unreadable .png file that is only 13kb large. What is going wrong?
Not all days have images,so it will be 13kb on these days.
Setup is:
E:\div_code\home\
  |-- roman.py
  |-- roman_convert
If i change roman.py to download more images at once.
Also make it so it show days with no image in both int and roman numb.
So here download 25 days,4 days has no image as show in list under.
[('CCCXLVI.png', 346), ('CCCXLVII.png', 347), ('CCCLVI.png', 356), ('CCCLXIII.png', 363)]
# roman.py
import requests
import os, re
from roman_convert import roman_to_int, int_to_roman

def make_url(roman_number):
    return f'https://swordscomic.com/comic/{roman_number}/'

no_image = []
def download(url, img_nr):
    img = requests.get(url)
    img_name = f'{int_to_roman(img_nr)}.png'
    with open(img_name, 'wb') as f_out:
        if len(img.content) < 15000:
            no_image.append(img_name)
        else:
            f_out.write(img.content)

if __name__ == '__main__':
    #img_nr = 364
    for img_nr in range(340, 365):
        url = f'https://swordscomic.com/media/Swords{img_nr}t.png'
        roman_number = int_to_roman(img_nr)
        org_link = make_url(roman_number)
        print(f'Dowloading --> {org_link}')
        download(url, img_nr)
    day_int = [roman_to_int(ro.split('.')[0]) for ro in no_image]
    print(list(zip(no_image, day_int)))
Run:
Output:
E:\div_code\home λ python roman.py Dowloading --> https://swordscomic.com/comic/CCCXL/ Dowloading --> https://swordscomic.com/comic/CCCXLI/ Dowloading --> https://swordscomic.com/comic/CCCXLII/ Dowloading --> https://swordscomic.com/comic/CCCXLIII/ Dowloading --> https://swordscomic.com/comic/CCCXLIV/ Dowloading --> https://swordscomic.com/comic/CCCXLV/ Dowloading --> https://swordscomic.com/comic/CCCXLVI/ Dowloading --> https://swordscomic.com/comic/CCCXLVII/ Dowloading --> https://swordscomic.com/comic/CCCXLVIII/ Dowloading --> https://swordscomic.com/comic/CCCXLIX/ Dowloading --> https://swordscomic.com/comic/CCCL/ Dowloading --> https://swordscomic.com/comic/CCCLI/ Dowloading --> https://swordscomic.com/comic/CCCLII/ Dowloading --> https://swordscomic.com/comic/CCCLIII/ Dowloading --> https://swordscomic.com/comic/CCCLIV/ Dowloading --> https://swordscomic.com/comic/CCCLV/ Dowloading --> https://swordscomic.com/comic/CCCLVI/ Dowloading --> https://swordscomic.com/comic/CCCLVII/ Dowloading --> https://swordscomic.com/comic/CCCLVIII/ Dowloading --> https://swordscomic.com/comic/CCCLIX/ Dowloading --> https://swordscomic.com/comic/CCCLX/ Dowloading --> https://swordscomic.com/comic/CCCLXI/ Dowloading --> https://swordscomic.com/comic/CCCLXII/ Dowloading --> https://swordscomic.com/comic/CCCLXIII/ Dowloading --> https://swordscomic.com/comic/CCCLXIV/ [('CCCXLVI.png', 346), ('CCCXLVII.png', 347), ('CCCLVI.png', 356), ('CCCLXIII.png', 363)]

Can show a quick test with Selenium,as this is maybe not so easy if new to this.
Here i go back 3 times,then send site source code to BS,so can to find the real download(it's not the roman numerals link) link in meta tag.
To download i use same function with some modifications.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time, os
import requests

def download(img_url):
    img = requests.get(img_url)
    img_name = os.path.basename(img_url)
    with open(img_name, 'wb') as f_out:
        if len(img.content) < 15000:
            no_image.append(img_name)
        else:
            f_out.write(img.content)

if __name__ == '__main__':
    #--| Setup
    chrome_options = Options()
    #chrome_options.add_argument("--headless")
    browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe')

    #--| Parse or automation
    browser.get('https://swordscomic.com/comic/CCCLXV/')
    back = browser.find_elements_by_css_selector('#navigation-previous')[0].click()
    time.sleep(3)
    back = browser.find_elements_by_css_selector('#navigation-previous')[0].click()
    time.sleep(3)
    back = browser.find_elements_by_css_selector('#navigation-previous')[0].click()

    # Give source code to BeautifulSoup
    soup = BeautifulSoup(browser.page_source, 'html.parser')
    img_url = soup.find('meta', property="og:image")
    img_url = img_url.attrs['content']
    download(img_url)
    browser.quit()
Reply


Messages In This Thread
RE: Downloading Images - Unable to find correct selector - by snippsat - Jan-22-2020, 04:54 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Div Class HTML selector in Python Artur 1 675 Mar-28-2024, 09:46 AM
Last Post: StevenSnyder
  Django: View is unable to find attributes of database model pythonpaul32 0 560 Dec-07-2023, 06:38 PM
Last Post: pythonpaul32
  python selenium downloading embedded pdf damian0612 0 3,777 Feb-23-2021, 09:11 PM
Last Post: damian0612
  Downloading CSV from a website bmiller12 1 1,843 Nov-26-2020, 09:33 AM
Last Post: Axel_Erfurt
  TDD/CSS & HTML testing - CSS selector (.has-error) makoseafox 0 1,827 May-13-2020, 07:41 PM
Last Post: makoseafox
  Downloading book preview Truman 6 3,568 May-15-2019, 10:02 PM
Last Post: Truman
  Downloading Multiple Webpages MoziakBeats 4 3,302 Apr-17-2019, 04:06 AM
Last Post: Skaperen
  Python - Scrapy - CSS selector Baggelhsk95 1 5,584 Nov-07-2018, 04:45 PM
Last Post: stranac
  Downloading txt files tjnichols 6 4,105 Aug-27-2018, 10:01 PM
Last Post: tjnichols
  Django+uWsgi unable to find "application" callable rosettas 3 12,006 Aug-24-2017, 01:41 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020