Web scraping errors - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Web scraping errors (/thread-30630.html) |
Web scraping errors - julan2020 - Oct-28-2020 Hi! Can someone please help a newbie here. Im trying to download all the images from xkcd.com, using the code from the book Automate the boring stuff with python. The code does work, but I get some errors after a while: Can someone please tell me what to do? I have searched for an answer but cant find a solution for my code to keep running / or skip over the errors. Here's the error message (in red):
Here is my code:import requests import bs4 import os url = 'https://xkcd.com' # starting url os.makedirs('xkcd2', exist_ok=True) # store comics in ./xkcd while not url.endswith('#'): # Download the page. print('Downloading page %s...' % url) res = requests.get(url) try: res.raise_for_status() except Exception as exc: print('There was a problem: %s' % (exc)) pass soup = bs4.BeautifulSoup(res.text, 'html.parser') # TODO: Find the URL of the comic image. comicElem = soup.select('#comic img') if comicElem == []: print('Could not find comic image.') else: comicUrl = 'https:' + comicElem[0].get('src') # Download the image. print('Downloading image %s...' % (comicUrl)) res = requests.get(comicUrl) try: res.raise_for_status() except Exception as exc: print('There was a problem: %s' % (exc)) pass # TODO: Download the image. # TODO: Save the image to ./xkcd. imageFile = open(os.path.join('xkcd2', os.path.basename(comicUrl)),'wb') for chunk in res.iter_content(100000): imageFile.write(chunk) imageFile.close() # Get the Prev button's url. prevLink = soup.select('a[rel="prev"]')[0] url = 'https://xkcd.com' + prevLink.get('href') # TODO: Get the Prev button's url. print('Done.') RE: Web scraping errors - buran - Oct-29-2020 check line 24 - you don't construct proper url |