Web scraping: os.path.basename - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Web scraping: os.path.basename (/thread-12396.html) |
Web scraping: os.path.basename - Truman - Aug-22-2018 I'm looking at tutorial Web-scraping part-2 and have a question regarding this code: import requests from bs4 import BeautifulSoup import webbrowser import os url = 'http://xkcd.com/1/' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'lxml') text = soup.select_one('#ctitle').text link = soup.find('div', id='comic').find('img').get('src') link = link.replace('//', 'http://') # Image title and link print(f'{text}\n{link}') # Download image img_name = os.path.basename(link) img = requests.get(link) with open(img_name, 'wb') as f_out: f_out.write(img.content) # Open image in browser or default image viewer webbrowser.open_new_tab(img_name)Why is img_name = os.path.basename(link) added? Is that a better practise from some reason?I also ran code with webbrowser.open_new_tab([b]link[/b]) and it works. Also, script works without lines 17-20.
RE: Web scraping: os.path.basename - Larz60+ - Aug-23-2018 your link will be http://imgs.xkcd.com/comics/barrel_cropped_(1).jpgso: img_name = os.path.basename(link)will get you: Doc:
RE: Web scraping: os.path.basename - Truman - Aug-23-2018 I read that doc explanation before posting but I didn't understand it, and still don't. This part - This is the second element of the pair returned by passing path to the function split(). Still, not sure why is that a better practise then adding link in webborowser.open which also works... RE: Web scraping: os.path.basename - snippsat - Aug-23-2018 (Aug-23-2018, 09:53 PM)Truman Wrote: Still, not sure why is that a better practise then adding link in webborowser.open which also works...It's not about best practice,it's a example i made of download a image from web to local hard drive. Then open that image on local hard drive in browser. Could of course just given html link to webbrowser module, but then would download and open local image in browse not have make sense. RE: Web scraping: os.path.basename - Truman - Aug-23-2018 Thank you, now doing more advanced download from number of pages... import requests from bs4 import BeautifulSoup import os import webbrowser browser_path = r"C:\Program Files (x86)\Mozilla Firefox\firefox.exe" webbrowser.register('mozzila', None, webbrowser.BackgroundBrowser(browser_path)) def image_down(start_img, stop_imp): for numb in range(start_img, stop_img): url = f'http://xkcd.com/{numb}' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'html.parser') link = soup.find('div', id='comic').find('img').get('src') link = link.replace('//', 'http://') img_name = os.path.basename(link) webbrowser.get('mozzila').open_new_tab(img_name) #try: #img = requests.get(link) #with open(img_name, 'wb') as f_out: #f_out.write(img.content) #except: # Just want images don't care about errors #pass if __name__ == '__main__': start_img = 1 stop_img = 5 image_down(start_img, stop_img)It opens only the first image in the first tab and for the rest in other 3 tabs it says that server is not found. solved it. Just changed line 16 to webbrowser.get('mozzila').open_new_tab(link)Ok, now it's all clear. |