Python Forum

Full Version: I wan't to Download all .zip Files From A Website (Project AI)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7
nevertheless, you must use raw_input for python 2.7

username = raw_input('Username: ')
Yes I tried that, and got the following Error Traceback :-

Error:
Warning (from warnings module): File "C:\Python27\lib\getpass.py", line 92 return fallback_getpass(prompt, stream) GetPassWarning: Can not control echo on the terminal. Warning: Password input may be echoed. Password: duxforded1 Traceback (most recent call last): File "C:\Users\Edward\Desktop\Python 2.79\Web Scraping Code For .ZIP Files 3.py", line 39, in <module> session = do_login() TypeError: do_login() takes exactly 1 argument (0 given)
Read the error. This is not the same error, the previous error has been fixed, and you have a new error on line 39.
The error message is quite clear about what is missing,
Error:
line 39, in <module> session = do_login() TypeError: do_login() takes exactly 1 argument (0 given)
do login is expecting an argument, 'credentials' which you have not passed.
so line 39 should now be:
session = do_login(credentials)
When I run the code, without Dead-Eye's code included, I get an Forbidden 403 Error.

I have been looking on the internet for a solution, to that problem. And some say you should add, a User Agent and header to the Code, i.e. the User Agent, as a Web Browser. In my case it would be Google Chrome.

Can anyone tell me how I can do that ?

P.S. thankyou Larz60+ for your latest reply.

Eddie
Fix existing issues first. otherwise you may be injecting a new problem.
Did you fix the
session = do_login(credentials)
issue?
Yes I did fix that issue, and when I ran the Code, it ran for a few seconds, ending with an Error :-

Here is Error Traceback :-

Error:
Traceback (most recent call last): File "C:\Users\Edward\Desktop\Python 2.79\Web Scraping Code For .ZIP Files 3.py", line 38, in <module> session = do_login(credentials) File "C:\Users\Edward\Desktop\Python 2.79\Web Scraping Code For .ZIP Files 3.py", line 10, in do_login session = do_login(credentials)
The same line there, is repeated many times. From the traceback, it appears the code when Run, is checking for .zip Files in the wrong Place. It is the last two lines, that are repeated many times, before the RuntimeError occurs.

Error:
RuntimeError: maximum recursion depth exceeded
Error:
RuntimeError: maximum recursion depth exceeded
this is a memory issue. You have run out of memory because there code that uses recursion and it is calling itself more times than your memory can handle.

The only code I have to look at is what you posted several posts back, and I don't see any recursion there.
Have you made major changes without posting new code?
No changes, apart from fixing the Errors, you kindly told me the changes in text I needed to make, to fix them.

Here is the Current Python Code :-

import sys
import getpass
import hashlib
import requests

BASE_URL = 'https://www.flightsim.com/'
 
 
def do_login(credentials):
    session = do_login(credentials)
    session.get(BASE_URL)
    req = session.post(BASE_URL + LOGIN_PAGE, params={'do': 'login'}, data=credentials)
    if req.status_code != 200:
        print('Login not successful')
        sys.exit(1)
    # session is now logged in
    return session
 
 
def get_credentials():
    username = raw_input('Username: ')
    password = getpass.getpass()
    password_md5 = hashlib.md5(password.encode()).hexdigest()
    return {
        'cookieuser': 1,
        'do': 'login',
        's': '',
        'securitytoken': 'guest',
        'vb_login_md5_password': password_md5,
        'vb_login_md5_password_utf': password_md5,
        'vb_login_password': '',
        'vb_login_password_hint': 'Password',
        'vb_login_username': username,
        }
 
 
credentials = get_credentials()
session = do_login(credentials)

import urllib2
from urllib2 import Request, urlopen, URLError
#import urllib
import os
from bs4 import BeautifulSoup
import sys

#Create a new directory to put the files into
#Get the current working directory and create a new directory in it named test
cwd = os.getcwd()
newdir = cwd +"\\test"
print "The current Working directory is " + cwd
os.mkdir( newdir, 0777);
print "Created new directory " + newdir
newfile = open('zipfiles.txt','w')
print newfile


print "Running script.. "
#Set variable for page to be open and url to be concatenated 
url = "http://www.flightsim.com"
page = urllib2.urlopen('https://www.flightsim.com/vbfs/fslib.php?do=search&fsec=62').read()

#File extension to be looked for. 
extension = ".zip"

#Use BeautifulSoup to clean up the page
soup = BeautifulSoup(page)
soup.prettify()

#Find all the links on the page that end in .zip
for anchor in soup.findAll('a', href=True):
    links = url + anchor['href']
    if links.endswith(extension):
        newfile.write(links + '\n')
newfile.close()

#Read what is saved in zipfiles.txt and output it to the user
#This is done to create presistent data 
newfile = open('zipfiles.txt', 'r')
for line in newfile:
    print line + '/n'
newfile.close()

#Read through the lines in the text file and download the zip files.
#Handle exceptions and print exceptions to the console
with open('zipfiles.txt', 'r') as url:
    for line in url:
        if line:
            try:
                ziplink = line
                #Removes the first 48 characters of the url to get the name of the file
                zipfile = line[48:]
                #Removes the last 4 characters to remove the .zip
                zipfile2 = zipfile[:3]
                print "Trying to reach " + ziplink
                response = urllib2.urlopen(ziplink)
            except URLError as e:
                if hasattr(e, 'reason'):
                    print 'We failed to reach a server.'
                    print 'Reason: ', e.reason
                    continue
                elif hasattr(e, 'code'):
                    print 'The server couldn\'t fulfill the request.'
                    print 'Error code: ', e.code
                    continue
            else:
                zipcontent = response.read()
                completeName = os.path.join(newdir, zipfile2+ ".zip")
                with open (completeName, 'w') as f:
                    print "downloading.. " + zipfile
                    f.write(zipcontent)
                    f.close()
print "Script completed"
I'm pretty confident that the recursion error is coming from the continue statements in your exception clauses.
If you encounter an error, you use continue which could loop forever.
You should allow the exception to exit, and there's a reason for it. or at least abort after several tries (by using a counter)
How do I do that Larz60+ ? Is the Code when run checking the Wrong area ? based on the Traceback Error I posted before ?

I also have the Program Wget, would it be easier to use that program, for Web-Scraping the .zip Files I wan't ? I tried the Following last night in Wget :-

wget -r -np -l 0 -A zip https://www.flightsim.com/vbfs/fslib.php...ch&fsec=62

But when I ran that code in Wget, I got the following Error :-

'fsec' is not recognized, as an internal or external command, operable program or Batchfile.

Any ideas how I can fix that problem ?

This was all the info shown in Command Prompt :-

Microsoft Windows [Version 6.1.7601]
Copyright © 2009 Microsoft Corporation. All rights reserved.

C:\Users\Edward>cd\wget

C:\wget>wget -r -np -l 0 -A zip https://www.flightsim.com/vbfs/fslib.php?do=sear
ch&fsec=62
--2018-08-27 13:40:56-- https://www.flightsim.com/vbfs/fslib.php?do=search
Resolving www.flightsim.com (www.flightsim.com)... 104.28.0.19, 104.28.1.19
Connecting to www.flightsim.com (www.flightsim.com)|104.28.0.19|:443... connecte
d.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: /vbfs/fslib.php?searchid=65875703 [following]
--2018-08-27 13:40:58-- https://www.flightsim.com/vbfs/fslib.php?searchid=65875
703
Reusing existing connection to www.flightsim.com:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'www.flightsim.com/vbfs/fslib.php@do=search.tmp'

www.flightsim.com/v [ <=> ] 57.21K --.-KB/s in 0.07s

2018-08-27 13:40:58 (814 KB/s) - 'www.flightsim.com/vbfs/fslib.php@do=search.tmp
' saved [58579]

Removing www.flightsim.com/vbfs/fslib.php@do=search.tmp since it should be rejec
ted.

FINISHED --2018-08-27 13:40:58--
Total wall clock time: 2.3s
Downloaded: 1 files, 57K in 0.07s (814 KB/s)
'fsec' is not recognized as an internal or external command,
operable program or batch file.

C:\wget>
Pages: 1 2 3 4 5 6 7