(May-22-2018, 06:30 AM)eddywinch82 Wrote: , to start downloading from the last .zip File downloaded, rather than downloading all of the downloaded .zip files again ? I mean can I put in a code, the last .zip file downloaded, and then start downloading from that point ?Start over in a new folder with my code that has progress bar,then let say you got all
.zip
for 69 planes.The your connection break down,now you know that you miss the last 3 planes.
So in my code i am using
yield url_file_id
to generate url's for all planes.The can use
itertools.islice
to slice out the last 3 that is missing.Code:
from bs4 import BeautifulSoup import requests from tqdm import tqdm, trange from itertools import islice def all_planes(): '''Generate url links for all planes''' url = 'http://web.archive.org/web/20041225023002/http://www.projectai.com:80/libraries/acfiles.php?cat=6' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'lxml') td = soup.find_all('td', width="50%") plain_link = [link.find('a').get('href') for link in td] for ref in tqdm(plain_link): url_file_id = 'http://web.archive.org/web/20041114195147/http://www.projectai.com:80/libraries/{}'.format(ref) yield url_file_id def download(all_planes): '''Download zip for 1 plain,feed with more url download all planes''' # A_300 = next(all_planes()) # Test with first link last_3 = islice(all_planes(), 69, 72) for plane_url in last_3: url_get = requests.get(plane_url) soup = BeautifulSoup(url_get.content, 'lxml') td = soup.find_all('td', class_="text", colspan="2") zip_url = 'http://web.archive.org/web/20041108022719/http://www.projectai.com:80/libraries/download.php?fileid={}' for item in tqdm(td): zip_name = item.text zip_number = item.find('a').get('href').split('=')[-1] with open(zip_name, 'wb') as f_out: down_url = requests.get(zip_url.format(zip_number)) f_out.write(down_url.content) if __name__ == '__main__': download(all_planes)Now looking at progress bar.
After 1 plane is dowloaded it's at 97%,because we start at 69 and total is 72