Jan-20-2020, 08:01 AM
I'm trying to write a script that 1) goes to a website 2) downloads and parses the HTML 3) downloads a comic image 4) selects the "previous comic button" 5) repeats 1-4
The script is failing on either hitting the "previous comic" button, or downloading the next page before it can reach the second image.
I have tried tinkering with the different selectors, but I'm not sure why it's not working.
The script is failing on either hitting the "previous comic" button, or downloading the next page before it can reach the second image.
I have tried tinkering with the different selectors, but I'm not sure why it's not working.
#! python3 #swordscraper.py - Downloads all the swords comics. import requests, os, bs4 os.chdir(r'C:\Users\bromp\OneDrive\Desktop\Python') os.makedirs('swords', exist_ok=True) #store comics in /swords url = 'https://swordscomic.com/' #starting url while not url.endswith('#'): #Download the page. print('Downloading page %s...' % url) res = requests.get(url) res.raise_for_status soup = bs4.BeautifulSoup(res.text, 'html.parser') #Find the URL of the comic image. comicElem = soup.select('#comic-image') if comicElem == []: print('Could not find comic image.') else: comicUrl = comicElem[0].get('src') comicUrl = "http://" + comicUrl if 'swords' not in comicUrl: comicUrl=comicUrl[:7]+'swordscomic.com/'+comicUrl[7:] #Download the image. print('Downloading image %s...' % (comicUrl)) res = requests.get(comicUrl) res.raise_for_status() #Save the image to ./swords imageFile = open(os.path.join('swords', os.path.basename(comicUrl)), 'wb') for chunk in res.iter_content(100000): imageFile.write(chunk) imageFile.close() #Get the Prev button's url. prevLink = soup.select('a[id="navigation-previous"]')[1] url = 'https://swordscomic.com/' + prevLink.get('href')This is the output it gives when I run it:
Downloading page https://swordscomic.com/... Downloading image http://swordscomic.com//media/Swords364t.png... Traceback (most recent call last): File "C:\Users\bromp\AppData\Local\Programs\Python\Python37-32\swordscraper.py", line 40, in <module> prevLink = soup.select('a[id="navigation-previous"]')[1] IndexError: list index out of rangeDo I need to use a different module, like Selenium?