Oct-28-2020, 10:10 PM
(This post was last modified: Oct-29-2020, 03:00 AM by Larz60+.
Edit Reason: added error and output tags
)
Hi!
Can someone please help a newbie here.
Im trying to download all the images from xkcd.com, using the code from the book Automate the boring stuff with python. The code does work, but I get some errors after a while:
Can someone please tell me what to do? I have searched for an answer but cant find a solution for my code to keep running / or skip over the errors.
Here's the error message (in red):
Can someone please help a newbie here.
Im trying to download all the images from xkcd.com, using the code from the book Automate the boring stuff with python. The code does work, but I get some errors after a while:
Can someone please tell me what to do? I have searched for an answer but cant find a solution for my code to keep running / or skip over the errors.
Here's the error message (in red):
Output:Downloading image https://imgs.xkcd.com/comics/election_night.png...
Downloading page https://xkcd.com/2067/...
Downloading image https:/2067/asset/challengers_header.png...
Error:Traceback (most recent call last):
File "C:/Users/Bruker/Desktop/IBE151 Practic. Program/Assignment/Web Scraping/scraping_test3.py", line 27, in <module>
res = requests.get(comicUrl)
File "C:\Users\Bruker\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\Bruker\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\Bruker\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 516, in request
prep = self.prepare_request(req)
File "C:\Users\Bruker\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 449, in prepare_request
p.prepare(
File "C:\Users\Bruker\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\models.py", line 314, in prepare
self.prepare_url(url, params)
File "C:\Users\Bruker\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\models.py", line 391, in prepare_url
raise InvalidURL("Invalid URL %r: No host supplied" % url)
requests.exceptions.InvalidURL: Invalid URL 'https:/2067/asset/challengers_header.png': No host supplied
Here is my code:import requests import bs4 import os url = 'https://xkcd.com' # starting url os.makedirs('xkcd2', exist_ok=True) # store comics in ./xkcd while not url.endswith('#'): # Download the page. print('Downloading page %s...' % url) res = requests.get(url) try: res.raise_for_status() except Exception as exc: print('There was a problem: %s' % (exc)) pass soup = bs4.BeautifulSoup(res.text, 'html.parser') # TODO: Find the URL of the comic image. comicElem = soup.select('#comic img') if comicElem == []: print('Could not find comic image.') else: comicUrl = 'https:' + comicElem[0].get('src') # Download the image. print('Downloading image %s...' % (comicUrl)) res = requests.get(comicUrl) try: res.raise_for_status() except Exception as exc: print('There was a problem: %s' % (exc)) pass # TODO: Download the image. # TODO: Save the image to ./xkcd. imageFile = open(os.path.join('xkcd2', os.path.basename(comicUrl)),'wb') for chunk in res.iter_content(100000): imageFile.write(chunk) imageFile.close() # Get the Prev button's url. prevLink = soup.select('a[rel="prev"]')[0] url = 'https://xkcd.com' + prevLink.get('href') # TODO: Get the Prev button's url. print('Done.')