Code Needs finishing Off Help Needed

***snippsat*** · May-21-2018, 01:04 PM

(May-21-2018, 12:43 PM)eddywinch82 Wrote: what do I need to type, so that the Files download, with the proper .zip File name ?

The code i posted with concurrent.futures was just a quick test to show how it can be done,
you shall not try to use concurrent.futures until all work as it should first.

You have to parse name as i did in your other post #12 with .utu files.
It's not so easy because you struggle with basic Python understating.

eddywinch82 · May-21-2018, 01:34 PM

Thanks snippsat I will look into that. Do you know, what the Traceback Errors I posted, just before mean ? Do you know any other programs, to increase download speeds in Python ? I got Traceback errors when using Axel aswell.

***snippsat*** · (This post was last modified: May-21-2018, 11:25 PM by snippsat.)

You can try this,i did look at download all .zip for all planes.
I let it run about 5-minute had no errors.
So if this is one time operation it may not be worth looking into concurrent.futures as i did show before.
Take break for a couple of hours Sleepy

,and see if you have gotten all zip files.

from bs4 import BeautifulSoup
import requests

def all_planes():
    '''Generate url links for all planes'''
    url = 'http://web.archive.org/web/20041225023002/http://www.projectai.com:80/libraries/acfiles.php?cat=6'
    url_get = requests.get(url)
    soup = BeautifulSoup(url_get.content, 'lxml')
    td = soup.find_all('td', width="50%")
    plain_link = [link.find('a').get('href') for link in td]
    all_links = []
    for ref in plain_link:
         url_file_id = 'http://web.archive.org/web/20041114195147/http://www.projectai.com:80/libraries/{}'.format(ref)
         yield url_file_id

def download(all_planes):
    '''Download zip for one plane,feed with more url's will download .zip for all planes'''
    # A_300 = next(all_planes())  # Test with first link
    for plane_url in all_planes():
        url_get = requests.get(plane_url)
        soup = BeautifulSoup(url_get.content, 'lxml')
        td = soup.find_all('td', class_="text", colspan="2")
        zip_url = 'http://web.archive.org/web/20041108022719/http://www.projectai.com:80/libraries/download.php?fileid={}'
        for item in td:
            zip_name = item.text
            zip_number = item.find('a').get('href').split('=')[-1]
            with open(zip_name, 'wb')  as f_out:
                down_url = requests.get(zip_url.format(zip_number))
                f_out.write(down_url.content)

if __name__ == '__main__':
    download(all_planes)

***snippsat*** · (This post was last modified: May-22-2018, 02:03 AM by snippsat.)

For code over can a progress bar be fine to have,as i showed in you other Thread.
So use tqdm.
Then can plug it in in both loops.
Example:

from tqdm import tqdm, trange

# Then in the 2 loops
for ref in tqdm(plain_link):
for item in tqdm(td):

Now can see that's it's 72 planes total.
In plain 3 which downloading now there are 21 .zip files.
Of corse the measure will jump a little as some planes have more .zip files.
Plane 2 had 171 .zip files and plane 1 had 4 .zip files.

eddywinch82 · (This post was last modified: May-22-2018, 06:55 AM by eddywinch82.)

Thanks for sorting this out for me snippsat. It's much appreciated, I actually managed to download quit alot of these .zip files last night. After running one of the codes, I allready have. But it stopped downloading after a couple of hours. Maybe I was being blocked by an internet server ? What do I need to do to today, I.e. type in a code, to start downloading from the last .zip File downloaded, rather than downloading all of the downloaded .zip files again ? I mean can I put in a code, the last .zip file downloaded, and then start downloading from that point ?

***snippsat*** · (This post was last modified: May-22-2018, 10:52 AM by snippsat.)

(May-22-2018, 06:30 AM)eddywinch82 Wrote: , to start downloading from the last .zip File downloaded, rather than downloading all of the downloaded .zip files again ? I mean can I put in a code, the last .zip file downloaded, and then start downloading from that point ?

Start over in a new folder with my code that has progress bar,then let say you got all .zip for 69 planes.
The your connection break down,now you know that you miss the last 3 planes.
So in my code i am using yield url_file_id to generate url's for all planes.
The can use itertools.islice to slice out the last 3 that is missing.
Code:

from bs4 import BeautifulSoup
import requests
from tqdm import tqdm, trange
from itertools import islice

def all_planes():
    '''Generate url links for all planes'''
    url = 'http://web.archive.org/web/20041225023002/http://www.projectai.com:80/libraries/acfiles.php?cat=6'
    url_get = requests.get(url)
    soup = BeautifulSoup(url_get.content, 'lxml')
    td = soup.find_all('td', width="50%")
    plain_link = [link.find('a').get('href') for link in td]
    for ref in tqdm(plain_link):
         url_file_id = 'http://web.archive.org/web/20041114195147/http://www.projectai.com:80/libraries/{}'.format(ref)
         yield url_file_id

def download(all_planes):
    '''Download zip for 1 plain,feed with more url download all planes'''
    # A_300 = next(all_planes())  # Test with first link
    last_3 = islice(all_planes(), 69, 72)
    for plane_url in last_3:
        url_get = requests.get(plane_url)
        soup = BeautifulSoup(url_get.content, 'lxml')
        td = soup.find_all('td', class_="text", colspan="2")
        zip_url = 'http://web.archive.org/web/20041108022719/http://www.projectai.com:80/libraries/download.php?fileid={}'
        for item in tqdm(td):
            zip_name = item.text
            zip_number = item.find('a').get('href').split('=')[-1]
            with open(zip_name, 'wb')  as f_out:
                down_url = requests.get(zip_url.format(zip_number))
                f_out.write(down_url.content)

if __name__ == '__main__':
    download(all_planes)

Now looking at progress bar.
After 1 plane is dowloaded it's at 97%,because we start at 69 and total is 72
[Image: EVDkJu.jpg]

eddywinch82 · May-22-2018, 11:55 AM

Thanks for that snippsat, how can I find out the total Number of Planes altogether ? Then I can use your new code, when I find out how many I have downloaded. That is easy to do by simply selecting all files in the folder. To find out the number of .zip files I have downloaded. It's the first part I need help with.

***snippsat*** · May-22-2018, 12:16 PM

(May-22-2018, 11:55 AM)eddywinch82 Wrote: Thanks for that snippsat, how can I find out the total Number of Planes altogether ?

It's 72 it should be clear of what i posted.
Remember that planes can differ on how many .zip files they have,
plane-1 has 4 and plane-2 has 171 .zip file.

It should be easy to see with my code when it say 50/72,
it means that you have gotten all .zip for the 50 first planes and 22 is remaining.

***snippsat*** · May-22-2018, 04:51 PM

Also with my code you can take it in step,no need to download all 72 planes in one go.
Because of islice method on yield,can start where you want.
Code under take 10 first planes.

from bs4 import BeautifulSoup
import requests
from tqdm import tqdm, trange
from itertools import islice
 
def all_planes():
    '''Generate url links for all planes'''
    url = 'http://web.archive.org/web/20041225023002/http://www.projectai.com:80/libraries/acfiles.php?cat=6'
    url_get = requests.get(url)
    soup = BeautifulSoup(url_get.content, 'lxml')
    td = soup.find_all('td', width="50%")
    plain_link = [link.find('a').get('href') for link in td]
    for ref in tqdm(plain_link):
         url_file_id = 'http://web.archive.org/web/20041114195147/http://www.projectai.com:80/libraries/{}'.format(ref)
         yield url_file_id
 
def download(all_planes):
    '''Download zip for 1 plain,feed with more url download all planes'''
    # A_300 = next(all_planes())  # Test with first link
    how_many_planes = islice(all_planes(), 0, 10)
    for plane_url in how_many_planes:
        url_get = requests.get(plane_url)
        soup = BeautifulSoup(url_get.content, 'lxml')
        td = soup.find_all('td', class_="text", colspan="2")
        zip_url = 'http://web.archive.org/web/20041108022719/http://www.projectai.com:80/libraries/download.php?fileid={}'
        for item in tqdm(td):
            zip_name = item.text
            zip_number = item.find('a').get('href').split('=')[-1]
            with open(zip_name, 'wb')  as f_out:
                down_url = requests.get(zip_url.format(zip_number))
                f_out.write(down_url.content)
 
if __name__ == '__main__':
    download(all_planes)

As example 20 next planes.

how_many_planes = islice(all_planes(), 10, 31)

eddywinch82 · (This post was last modified: May-23-2018, 06:22 AM by eddywinch82.)

Thankyou so much snippsat, I ran your new Python Code last night. And now I have downloaded, all the Planes and. Zip files I need. Your help has been very much appreciated. Eddie

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Python Code Help Needed	eddywinch82	4	4,157	Sep-28-2018, 06:38 PM Last Post: joomdev1309

Code Needs finishing Off Help Needed

User Panel Messages

Announcements