I wan't to Download all .zip Files From A Website (Project AI)

eddywinch82 · (This post was last modified: Aug-25-2018, 12:12 PM by eddywinch82.)

Hi there,

I downloaded .zip Files, a while back, using a Python Code,which I was very kindly helped with by snippsat, and others on here. I would now like to download all the available Project AI .zip Files, from the www.flightsim.com Website.

I tried to adapt the original code, so it would download all the .zip Files from the www.flightsim.com Website. My Adapted code, won't download the Files unsurprisingly, but no errors either, the code when run does nothing. the Plane .zip Files are not in Plane Categories this time, there are 253 pages, with .zip Files on all 253 Pages about 2500 .zip Files altogether.

The search Id is not the same each time you do a search, the number changes, you simply choose the Category in the File Library, i.e. Project AI Files, and leave the search box blank, if you want to search for all the .zip Files :- Here is my adapted code :-

from bs4 import BeautifulSoup
import requests, zipfile, io, concurrent.futures

def download(number_id):
    a_zip = 'http://www.flightsim.com/vbfs/fslib.php?do=copyright&fid={}'.format(number_id)
    with open('{}.zip'.format(number_id), 'wb') as f:
        f.write(requests.get(a_zip).content)

if __name__ == '__main__':
    file_id = list(range(1,50))
    with concurrent.futures.ProcessPoolExecutor(max_workers=10) as executor:
        for number_id in file_id:
            executor.submit(download, number_id)

def get_zips(zips_page):
    # print(zips_page)
    zips_source = requests.get(zips_page).text
    zip_soup = BeautifulSoup(zips_source, "html.parser")
    for zip_file in zip_soup.select("a[href*=fslib.php?searchid=65822324&page=]"):
        zip_url = link_root + zip_file['href']
        print('downloading', zip_file.text, '...',)
        r = requests.get(zip_url)
        with open(zip_file.text, 'wb') as zipFile:
            zipFile.write(r.content)


def download_links(root, page):        
    url = ''.join([root, page])      
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, "html.parser")

    for zips_suffix in soup.select("a[href*=fslib.php?do=copyright&fid=]"):
        # get_zips(root, zips_suffix['href'])
        next_page = ''.join([root, zips_suffix['href']])
        get_zips(next_page)


link_root = 'http://www.flightsim.com/vbfs/fslib.php?'

page = 'do=copyright&fid='
download_links(link_root, page)

Can Someone help me make corrections to my Code ? Or point me in the right direction ?

Any help would be much appreciated

Eddie

eddywinch82 · (This post was last modified: Aug-25-2018, 01:56 PM by eddywinch82.)

Also this is a later Python Code, can it be adapted in the sense of, instead of last number of Planes etc, use last number of pages of the 253 total ? Here is the code, that was used for the Project AI Website .zip Files :-

from bs4 import BeautifulSoup
import requests
from tqdm import tqdm, trange
from itertools import islice
 
def all_planes():
    '''Generate url links for all planes'''
    url = 'http://web.archive.org/web/20031124231537/http://www.projectai.com:80/libraries/acfiles.php?cat=6'
    url_get = requests.get(url)
    soup = BeautifulSoup(url_get.content, 'lxml')
    td = soup.find_all('td', width="50%")
    plain_link = [link.find('a').get('href') for link in td]
    for ref in tqdm(plain_link):
         url_file_id = 'http://web.archive.org/web/20031124231537/http://www.projectai.com:80/libraries/{}'.format(ref)
         yield url_file_id
 
def download(all_planes):
    '''Download zip for 1 plain,feed with more url download all planes'''
    # A_300 = next(all_planes())  # Test with first link
    last_47 = islice(all_planes(), 25, 72)
    for plane_url in last_47:
        url_get = requests.get(plane_url)
        soup = BeautifulSoup(url_get.content, 'lxml')
        td = soup.find_all('td', class_="text", colspan="2")
        zip_url = 'http://web.archive.org/web/20031124231537/http://www.projectai.com:80/libraries/download.php?fileid={}'
        for item in tqdm(td):
            zip_name = item.text
            zip_number = item.find('a').get('href').split('=')[-1]
            with open(zip_name, 'wb')  as f_out:
                down_url = requests.get(zip_url.format(zip_number))
                f_out.write(down_url.content)
 
if __name__ == '__main__':
    download(all_planes)

Eddie

DeaD_EyE · (This post was last modified: Aug-25-2018, 04:04 PM by DeaD_EyE.)

You need to create a new session with requests.Session().

import sys
import getpass
import hashlib
import requests


BASE_URL = 'https://www.flightsim.com/'


def do_login(credentials):
    session = requests.Session()
    session.get(BASE_URL)
    req = session.post(BASE_URL + LOGIN_PAGE, params={'do': 'login'}, data=credentials)
    if req.status_code != 200:
        print('Login not successful')
        sys.exit(1)
    # session is now logged in
    return session


def get_credentials():
    username = input('Username: ')
    password = getpass.getpass()
    password_md5 = hashlib.md5(password.encode()).hexdigest()
    return {
        'cookieuser': 1,
        'do': 'login',
        's': '',
        'securitytoken': 'guest',
        'vb_login_md5_password': password_md5,
        'vb_login_md5_password_utf': password_md5,
        'vb_login_password': '',
        'vb_login_password_hint': 'Password',
        'vb_login_username': username,
        }


credentials = get_credentials()
session = do_login()

Seeking files, works without session. Downloading files, needs a valid login of an user.
I made some example code, to try it out, but until now no success.

EDIT: It seems, that the user is still not logged in. Maybe I'm sending wrong parameters to the form.

**Larz60+** · (This post was last modified: Aug-25-2018, 03:10 PM by Larz60+.)

The way to solve something like this is to get down to the basics.
Your code seems to run OK up step 30.
the URL for the first request is:

Output:
http://web.archive.org/web/20031124231537/http://www.projectai.com:80/libraries/download.php?fileid={3810}

so try that by utself with requests:

import requests

url = 'http://web.archive.org/web/20031124231537/http://www.projectai.com:80/libraries/download.php?fileid={3810}'

response = requests.get(url)
print('status code: {}'.format(response.status_code))
if response.status_code == 200:
    print('saving page')
    with open('results.html', 'wb') as fp:
        fp.write(response.content)

it returns a 404 error which is:

Quote:404 Not Found
The requested resource could not be found but may be available in the future. Subsequent requests by the client are permissible.

if you try that url by itself (in browser), it brings you to a wayback machine error page:

Output:Hrm.

The Wayback Machine has not archived that URL.

This page is not available on the web

because page does not exist

Try it!
If you can find the actual url, then you can go from there (use dead-eye's code)
NOTE: a session is a good idea, but not strictly needed to download zip files, I do it all the time.

DeaD_EyE · (This post was last modified: Aug-25-2018, 04:09 PM by DeaD_EyE.)

You need a session.

Just try it with your browser: https://www.flightsim.com/vbfs/fslib.php...fid=202702

If you see this, then you're logged in: [Image: www.flightsim.com]

If not, you see this: [Image: www.flightsim.com]

eddywinch82 · (This post was last modified: Aug-25-2018, 07:13 PM by eddywinch82.)

Hi guys,

Hi deadeye I get the following error when I run your code :-

Error:Warning (from warnings module):
  File "C:\Python34\lib\getpass.py", line 101
    return fallback_getpass(prompt, stream)
GetPassWarning: Can not control echo on the terminal.
Warning: Password input may be echoed.
Password: duxforded1
Traceback (most recent call last):
  File "C:\Users\Edward\Desktop\Python 3.4.3\number 10.py", line 42, in <module>
    session = do_login()
TypeError: do_login() missing 1 required positional argument: 'credentials'

Any ideas what is going wrong there ?

The download links are the following path :- https://www.flightsim.com/vbfs/fslib.php...yright&fid=

with a unique number after the = sign :-

And the page number is the following path :- https://www.flightsim.com/vbfs/fslib.php...37849&page= and the number of the page is after the = sign there are 253 pages in total. The searchid= number changes each time you do a search.

Larz60+ I am not using the Project AI website paths this time, I am using the Flightsim.com Website paths. I appreciate both of you helping me.

eddywinch82 · (This post was last modified: Aug-25-2018, 10:06 PM by eddywinch82.)

I have found out through view page source (right mouse click), that another path to the Project AI File Section is :-

https://www.flightsim.com/vbfs/fslib.php...ch&fsec=62

**Larz60+** · (This post was last modified: Aug-26-2018, 02:31 AM by Larz60+.)

The following will extract the page links from the web page in your last post, and print the url's that reference pages
(indexes for remaining pages)

it will also print download links and zipfile names
the actual download links appear to be like: https://www.flightsim.com/vbfs/fslib.php...&fid=64358

import requests
from bs4 import BeautifulSoup


class MyAttempt:
    def __init__(self):
        self.build_catalog()

    def build_catalog(self):
        page1_url = 'https://www.flightsim.com/vbfs/fslib.php?searchid=65842563'
        page = self.get_page(page1_url)
        soup = BeautifulSoup(page, 'lxml')
        for link in soup.findAll('a', href=True):
            url = link['href']
            text = link.text
            if 'page=' in url:
                print(f'page in url: {url}\ntext: {text}\n')
            if 'copyright' in url:
                print(f'actual download link: {url}\ntext: {text}\n')
                

    def get_page(self, url):
        ok_status = 200
        page = None
        response = requests.get(url, allow_redirects=False)
        if response.status_code == ok_status:
            page = response.content
        else:
            print(f'Could not load url: {url}')
        return page


if __name__ == '__main__':
    MyAttempt()

Please note copyright!

***snippsat*** · (This post was last modified: Aug-26-2018, 11:10 AM by snippsat.)

That output looks little messy for me @Larz60+.

@eddywinch82 looked at code i did before here,
it had some fancy stuff like progress bar and itertools.islice to any .zip file range wanted.

A quick test with link in your last post.

from bs4 import BeautifulSoup
import requests

url = 'https://www.flightsim.com/vbfs/fslib.php?searchid=65852160'
base_url = 'https://www.flightsim.com/vbfs'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
zip_links = soup.find_all('div', class_="fsc_details")
for link in zip_links:
    print(link.find('a').text)
    print('-------------')
    print(f"{base_url}/{link.find('a').get('href')}")

Output:paidf042.zip
https://www.flightsim.com/vbfs/fslib.php?do=copyright&fid=64358
--------------------------
paidf041.zip
https://www.flightsim.com/vbfs/fslib.php?do=copyright&fid=64357
--------------------------
paidf040.zip
https://www.flightsim.com/vbfs/fslib.php?do=copyright&fid=64356
--------------------------
paidf039.zip
https://www.flightsim.com/vbfs/fslib.php?do=copyright&fid=64355
--------------------------
........

So the .zip and with dowload link,if had beed logged in could download all .zip for that page.
Og write code that go trough all page(simple page system 2,3,4, ect...) and download.

eddywinch82 · Aug-26-2018, 11:18 AM

How do I do that snippsat ? Thanks guys, for all your input.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Using Flask with thonny for my project website need help	Confusednoob	3	2,007	Mar-16-2025, 05:42 PM Last Post: snippsat
	Website scrapping and download	santoshrane	3	5,634	Apr-14-2021, 07:22 AM Last Post: kashcode
	Login and download an exported csv file within a ribbon/button in a website	Alekhya	0	3,647	Feb-26-2021, 04:15 PM Last Post: Alekhya
	Cant Download Images from Unsplash Website	firaki12345	1	3,273	Feb-08-2021, 04:15 PM Last Post: buran
	Download some JPG files and make it a single PDF & share it	rompdeck	5	7,042	Jul-31-2020, 01:15 AM Last Post: Larz60+
	download pdf file from website	m_annur2001	1	3,685	Jun-21-2019, 05:03 AM Last Post: j.crater
	Access my webpage and download files from Python	Pedroski55	7	7,481	May-26-2019, 12:08 PM Last Post: snippsat
	Download all secret links from a map design website	fyec	0	3,501	Jul-24-2018, 09:08 PM Last Post: fyec
	I Want To Download Many Files Of Same File Extension With Either Wget Or Python,	eddywinch82	15	18,076	May-20-2018, 06:05 PM Last Post: eddywinch82

I wan't to Download all .zip Files From A Website (Project AI)

User Panel Messages

Announcements