I wan't to Download all .zip Files From A Website (Project AI)

**Larz60+** · (This post was last modified: Aug-28-2018, 06:10 PM by Larz60+.)

Here's the deal. This site is very difficult to scrape.
The reason is that the download URL keeps changing (i would guess to prevent bots).
Try it, the url you gave me no longer works, but it did when posted.
This is taking too much of my time, and proving to be much more difficult because of moving target.
Reluctantly I can't spend any more time on it, at least not today (I have surgery in the AM, so have to prepare for that).

I would suggest getting the auto password part (Dead-eye) gave you first, then you can go to first page and run following to get all links:

name this one: Fspaths.py

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

from pathlib import Path
import os
 
 
class Fspaths:
    def __init__(self):
        os.chdir(os.path.abspath(os.path.dirname(__file__)))
        homepath = Path('.')
 
        self.datapath = homepath / 'data'
        self.datapath.mkdir(exist_ok=True)
         
        self.htmlpath = self.datapath / 'html'
        self.htmlpath.mkdir(exist_ok=True)
 
        self.flightsimpath = self.datapath / 'FlightSimFiles'
        self.flightsimpath.mkdir(exist_ok=True)
 
        self.page1_html = self.htmlpath / 'pagespan.html'
        self.links = self.flightsimpath / 'links.txt'
 
        self.base_catalog_url = 'https://www.flightsim.com/vbfs/fslib.php?searchid=65893537&page='
 
if __name__ == '__main__':
    Fspaths()

and this one: ScrapeUrlList.py

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

import Fspaths
from bs4 import BeautifulSoup
import requests
 
 
class ScrapeUrlList:
    def __init__(self):
        self.fpath = Fspaths.Fspaths()
        self.ziplinks = []
 
    def get_url(self, url):
        page = None
        response = requests.get(url)
        if response.status_code == 200:
            page = response.content
        else:
            print(f'Cannot load URL: {url}')
        return page
 
    def get_catalog(self):
        base_url = 'https://www.flightsim.com/vbfs'
        with self.fpath.links.open('w' ) as fp:
            baseurl = self.fpath.base_catalog_url
            for pageno in range(1, 254):
                url = f'https://www.flightsim.com/vbfs/fslib.php?searchid=65893537&page={pageno}'
                print(f'url: {url}')
                page = self.get_url(self.fpath.base_catalog_url)
            if page:
                soup = BeautifulSoup(page, 'lxml')
                zip_links = soup.find_all('div', class_="fsc_details")
                for link in zip_links:
                    fp.write(f"{link.find('a').text}, {base_url}/{link.find('a').get('href')}")
                input()
            else:
                print(f'No page: {url}')
 
def main():
    sul = ScrapeUrlList()
    sul.get_catalog()
 
 
if __name__ == '__main__':
    main()

The searchid is what changes, and you need to get a new seed (you can change code to use as an attribute)
before creating the download list.
Then (not written), you need to use the created list to download the zip files

This code will build a directory tree named 'data' from wherever you put the scripts.
the links file is created in a subdirectory named FlightSimFiles

eddywinch82 · Aug-28-2018, 08:07 PM

Many thanks for this Larz60+, which Version of Python, will I need to run these codes in ?

**Larz60+** · (This post was last modified: Aug-28-2018, 08:55 PM by Larz60+.)

yes must be 3.6 or newer suggest installing 3.7
you can use an older version, but will have to remove all f-strings, these look like:

1	`url` `=` `f'https://www.flightsim.com/vbfs/fslib.php?searchid=65893537&page={pageno}'`

and can be replaced with:

1	`url` `=` `'https://www.flightsim.com/vbfs/fslib.php?searchid=65893537&page={}'.format(pageno)`

but f-string is so useful, I'd upgrade (to at least 3.6) for that alone.

Please recognize that this program will create a file of urls where you get the .zips
you can call get_url with each or these to download the zip, and then add a write routine to save as 'page' as mode 'wb'
it would be prettier to add a 'savefile=None' attribute to get_url, if populated save the page to that file, and probably a flag attribute,
default zip=False or mode = 'w' to indicate mode.
example (untested):

        
              def get_url(self, url, savefile=None, mode='w'):
    page = None
    response = requests.get(url)
    if response.status_code == 200:
        page = response.content
        if savefile:
            with savefile.open(mode) as fout:
                fout.write(page)
    else:
        print(f'Cannot load URL: {url}')
    return page

eddywinch82 · (This post was last modified: Aug-28-2018, 09:15 PM by eddywinch82.)

Is it possible to have Python, read the File where all the .zip File links are, and then download all the .zip Files from all the links in the File ? Or is that what you mean ?

**Larz60+** · (This post was last modified: Aug-29-2018, 02:27 AM by Larz60+.)

that's code you need to write
just pass the urls that are in the links.txt file one at a time to the new get_url

eddywinch82 · (This post was last modified: Aug-29-2018, 08:49 PM by eddywinch82.)

Hi Larz60+, I am not sure what to write in code, based on what you have told me,
I need to do. Also the links.txt File was created and the links were obtained, but no links where written to that File. What could cause that to happen ? Is that due to a missing Python Module, and if so which one ? Or something missing from the ScrapeUrlList.py Code ? No Traceback Error is shown in the Code, it just finishes obtaining the links, but no Link Data, is written to the File in question.

**Larz60+** · Aug-29-2018, 09:16 PM

Also the links.txt File was created and the links were obtained, but no links where written to that File
did you change the seed line 25 of ScrapeUrlList.py?
you need to log into flight sim and get the base download page.
As I stated before, the seed is changes for each session, so it must be passed to the program.

Output:url = f'https://www.flightsim.com/vbfs/fslib.php?searchid=65893537&page={pageno}'
                                                          --------
                                                              |___ This changes for each session.

For a quick fix, either replace that number each time or change the code to accept seed from command line argument.
what really needs to be done:
You must first and foremost get the password code that DeaD_EyE provided to work and added to the program.
Once you have done that, you can automatically link to the download page and fetch the seed.

Also you don't have to write the url's to a file, you can just call get_url and download immediately.

eddywinch82 · Aug-29-2018, 09:53 PM

Hi Larz60+,

I ran Dead Eye's Code with the ScrapeUrlList.py Code, and got the following Traceback Error :-

Error:Username: eddywinch82

Warning (from warnings module):
  File "C:\Python37\lib\getpass.py", line 100
    return fallback_getpass(prompt, stream)
GetPassWarning: Can not control echo on the terminal.
Warning: Password input may be echoed.
Password: duxforded1
Traceback (most recent call last):
  File "C:/Users/Edward/Desktop/Python 3.7/Combined Code.py", line 39, in <module>
    session = do_login(credentials)
  File "C:/Users/Edward/Desktop/Python 3.7/Combined Code.py", line 13, in do_login
    req = session.post(BASE_URL + LOGIN_PAGE, params={'do': 'login'}, data=credentials)
NameError: name 'LOGIN_PAGE' is not defined

**Larz60+** · (This post was last modified: Aug-29-2018, 11:15 PM by Larz60+.)

You already know that DeaD_EyE's code needs to be modified before it will work.
Don't try to bite off more than you can chew. Concentrate of getting his code to log in.
Once you've done that move forward. You need to learn that to be successful at coding you write code that does one thing, get that to work, then add another until you are there.
There is an alternative, and a valid one. If you don't want to do it yourself, post it in the jobs section. You may find someone that wants to do it all for a small fee.

I may take another look at this in a few days. I just had surgery this morning, and not able, or willing at this point, to work on this any more.

Get the password code to work, read the error tracebacks:

Error:File "C:/Users/Edward/Desktop/Python 3.7/Combined Code.py", line 13, in do_login
    req = session.post(BASE_URL + LOGIN_PAGE, params={'do': 'login'}, data=credentials)
NameError: name 'LOGIN_PAGE' is not defined

is telling you it couldn't find 'LOGIN PAGE' go to the login page, and in your browser:

if firefox, click on Tools --> Web Developer --> Page Source.
You can save the file and then examine in your favorite editor.
Find out what it is expecting for login, and modify the code to do what's required.

If unfamiliar with HTML, take a basic tutorial (W3 schools is good) and learn what you have to know, but just searching will probably get you what you want.

eddywinch82 · (This post was last modified: Aug-30-2018, 10:42 AM by eddywinch82.)

I realised what was missing from the code :-

I now have the following Line of Code :-

LOGIN_PAGE = 'https://www.flightsim.com/vbfs/login.php?do=login'

However when I type my password, and press enter after maybe 10 seconds,

The message "Login Unsuccessful" appears, any Ideas Dead_Eye what the issue is here ?

I am considering putting these Codes in the Job Section.
Is there anyone willing to sort it out for me ? For a small fee ? And if so how much ?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Using Flask with thonny for my project website need help	Confusednoob	3	2,055	Mar-16-2025, 05:42 PM Last Post: snippsat
	Website scrapping and download	santoshrane	3	5,657	Apr-14-2021, 07:22 AM Last Post: kashcode
	Login and download an exported csv file within a ribbon/button in a website	Alekhya	0	3,672	Feb-26-2021, 04:15 PM Last Post: Alekhya
	Cant Download Images from Unsplash Website	firaki12345	1	3,289	Feb-08-2021, 04:15 PM Last Post: buran
	Download some JPG files and make it a single PDF & share it	rompdeck	5	7,065	Jul-31-2020, 01:15 AM Last Post: Larz60+
	download pdf file from website	m_annur2001	1	3,712	Jun-21-2019, 05:03 AM Last Post: j.crater
	Access my webpage and download files from Python	Pedroski55	7	7,545	May-26-2019, 12:08 PM Last Post: snippsat
	Download all secret links from a map design website	fyec	0	3,509	Jul-24-2018, 09:08 PM Last Post: fyec
	I Want To Download Many Files Of Same File Extension With Either Wget Or Python,	eddywinch82	15	18,167	May-20-2018, 06:05 PM Last Post: eddywinch82

I wan't to Download all .zip Files From A Website (Project AI)

User Panel Messages

Announcements