nevertheless, you must use raw_input for python 2.7
username = raw_input('Username: ')
I wan't to Download all .zip Files From A Website (Project AI)
|
nevertheless, you must use raw_input for python 2.7
username = raw_input('Username: ')
Aug-27-2018, 12:36 AM
Yes I tried that, and got the following Error Traceback :-
Aug-27-2018, 02:30 AM
Read the error. This is not the same error, the previous error has been fixed, and you have a new error on line 39.
The error message is quite clear about what is missing, do login is expecting an argument, 'credentials' which you have not passed.so line 39 should now be: session = do_login(credentials)
Aug-27-2018, 08:40 AM
(This post was last modified: Aug-27-2018, 08:40 AM by eddywinch82.)
When I run the code, without Dead-Eye's code included, I get an Forbidden 403 Error.
I have been looking on the internet for a solution, to that problem. And some say you should add, a User Agent and header to the Code, i.e. the User Agent, as a Web Browser. In my case it would be Google Chrome. Can anyone tell me how I can do that ? P.S. thankyou Larz60+ for your latest reply. Eddie
Aug-27-2018, 11:07 AM
Fix existing issues first. otherwise you may be injecting a new problem.
Did you fix the session = do_login(credentials)issue?
Aug-27-2018, 11:39 AM
(This post was last modified: Aug-27-2018, 11:39 AM by eddywinch82.)
Yes I did fix that issue, and when I ran the Code, it ran for a few seconds, ending with an Error :-
Here is Error Traceback :- The same line there, is repeated many times. From the traceback, it appears the code when Run, is checking for .zip Files in the wrong Place. It is the last two lines, that are repeated many times, before the RuntimeError occurs.
Aug-27-2018, 11:43 AM
this is a memory issue. You have run out of memory because there code that uses recursion and it is calling itself more times than your memory can handle.The only code I have to look at is what you posted several posts back, and I don't see any recursion there. Have you made major changes without posting new code?
Aug-27-2018, 12:02 PM
(This post was last modified: Aug-27-2018, 12:02 PM by eddywinch82.)
No changes, apart from fixing the Errors, you kindly told me the changes in text I needed to make, to fix them.
Here is the Current Python Code :- import sys import getpass import hashlib import requests BASE_URL = 'https://www.flightsim.com/' def do_login(credentials): session = do_login(credentials) session.get(BASE_URL) req = session.post(BASE_URL + LOGIN_PAGE, params={'do': 'login'}, data=credentials) if req.status_code != 200: print('Login not successful') sys.exit(1) # session is now logged in return session def get_credentials(): username = raw_input('Username: ') password = getpass.getpass() password_md5 = hashlib.md5(password.encode()).hexdigest() return { 'cookieuser': 1, 'do': 'login', 's': '', 'securitytoken': 'guest', 'vb_login_md5_password': password_md5, 'vb_login_md5_password_utf': password_md5, 'vb_login_password': '', 'vb_login_password_hint': 'Password', 'vb_login_username': username, } credentials = get_credentials() session = do_login(credentials) import urllib2 from urllib2 import Request, urlopen, URLError #import urllib import os from bs4 import BeautifulSoup import sys #Create a new directory to put the files into #Get the current working directory and create a new directory in it named test cwd = os.getcwd() newdir = cwd +"\\test" print "The current Working directory is " + cwd os.mkdir( newdir, 0777); print "Created new directory " + newdir newfile = open('zipfiles.txt','w') print newfile print "Running script.. " #Set variable for page to be open and url to be concatenated url = "http://www.flightsim.com" page = urllib2.urlopen('https://www.flightsim.com/vbfs/fslib.php?do=search&fsec=62').read() #File extension to be looked for. extension = ".zip" #Use BeautifulSoup to clean up the page soup = BeautifulSoup(page) soup.prettify() #Find all the links on the page that end in .zip for anchor in soup.findAll('a', href=True): links = url + anchor['href'] if links.endswith(extension): newfile.write(links + '\n') newfile.close() #Read what is saved in zipfiles.txt and output it to the user #This is done to create presistent data newfile = open('zipfiles.txt', 'r') for line in newfile: print line + '/n' newfile.close() #Read through the lines in the text file and download the zip files. #Handle exceptions and print exceptions to the console with open('zipfiles.txt', 'r') as url: for line in url: if line: try: ziplink = line #Removes the first 48 characters of the url to get the name of the file zipfile = line[48:] #Removes the last 4 characters to remove the .zip zipfile2 = zipfile[:3] print "Trying to reach " + ziplink response = urllib2.urlopen(ziplink) except URLError as e: if hasattr(e, 'reason'): print 'We failed to reach a server.' print 'Reason: ', e.reason continue elif hasattr(e, 'code'): print 'The server couldn\'t fulfill the request.' print 'Error code: ', e.code continue else: zipcontent = response.read() completeName = os.path.join(newdir, zipfile2+ ".zip") with open (completeName, 'w') as f: print "downloading.. " + zipfile f.write(zipcontent) f.close() print "Script completed"
Aug-27-2018, 12:09 PM
I'm pretty confident that the recursion error is coming from the continue statements in your exception clauses.
If you encounter an error, you use continue which could loop forever. You should allow the exception to exit, and there's a reason for it. or at least abort after several tries (by using a counter)
Aug-27-2018, 12:57 PM
(This post was last modified: Aug-27-2018, 12:57 PM by eddywinch82.)
How do I do that Larz60+ ? Is the Code when run checking the Wrong area ? based on the Traceback Error I posted before ?
I also have the Program Wget, would it be easier to use that program, for Web-Scraping the .zip Files I wan't ? I tried the Following last night in Wget :- wget -r -np -l 0 -A zip https://www.flightsim.com/vbfs/fslib.php...ch&fsec=62 But when I ran that code in Wget, I got the following Error :- 'fsec' is not recognized, as an internal or external command, operable program or Batchfile. Any ideas how I can fix that problem ? This was all the info shown in Command Prompt :- Microsoft Windows [Version 6.1.7601] Copyright © 2009 Microsoft Corporation. All rights reserved. C:\Users\Edward>cd\wget C:\wget>wget -r -np -l 0 -A zip https://www.flightsim.com/vbfs/fslib.php?do=sear ch&fsec=62 --2018-08-27 13:40:56-- https://www.flightsim.com/vbfs/fslib.php?do=search Resolving www.flightsim.com (www.flightsim.com)... 104.28.0.19, 104.28.1.19 Connecting to www.flightsim.com (www.flightsim.com)|104.28.0.19|:443... connecte d. HTTP request sent, awaiting response... 302 Moved Temporarily Location: /vbfs/fslib.php?searchid=65875703 [following] --2018-08-27 13:40:58-- https://www.flightsim.com/vbfs/fslib.php?searchid=65875 703 Reusing existing connection to www.flightsim.com:443. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: 'www.flightsim.com/vbfs/fslib.php@do=search.tmp' www.flightsim.com/v [ <=> ] 57.21K --.-KB/s in 0.07s 2018-08-27 13:40:58 (814 KB/s) - 'www.flightsim.com/vbfs/fslib.php@do=search.tmp ' saved [58579] Removing www.flightsim.com/vbfs/fslib.php@do=search.tmp since it should be rejec ted. FINISHED --2018-08-27 13:40:58-- Total wall clock time: 2.3s Downloaded: 1 files, 57K in 0.07s (814 KB/s) 'fsec' is not recognized as an internal or external command, operable program or batch file. C:\wget> |
|