FTP download directories - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: FTP download directories (/thread-2414.html) |
FTP download directories - Sorkkaelain - Mar-14-2017 I am trying to make this script to automate some of my daily FTP downloading at work. I am not programmer but I been practicing Python for a while now. Currently I can download single files but cant figure out how to download whole directory. For testing purposes I have been trying to downloading ftp.debian.org/debian/doc dir Could someone point me to the right direction? My current code is: #FTP import ftplib import wget site_address = input('please enter FTP address: ') with ftplib.FTP(site_address) as ftp: ftp.login() print(ftp.getwelcome()) print('Current directory', ftp.pwd()) ftp.dir() next_dir = input('Where do you want to go next? ') ftp.cwd(next_dir) print('Current directory', ftp.pwd()) ftp.dir() download = input('what file would you like to download? ') ftp.retrbinary('RETR ' + download, open(download, 'wb').write) print('File download succesful') ftp.quit() print('goodbye') RE: FTP download directories - Larz60+ - Mar-15-2017 Surround your code with "python tags" not quote ... fixed your post RE: FTP download directories - wavic - Mar-15-2017 I suggest you use wget instead. # download the whole directory wget -rpk -l 10 -np -c --random-wait -U Mozilla http://ftp.debian.org/debian/docr - recursive download p - gets not only the HTML files k - convert the links in a way to be able to use the web pages locally l - subdirectories level np - --no-parent directories c - continue if case network failure or something else. You can run the command again and it will not get the downloaded files again U - user agent random-wait - wait until next request You can provide the user if login is needed. Add --user=username --ask-password to the options. Do not use --password="" or similar as ftp://user:[email protected]/dir address because it will be in cmd history You may add -R html htm to discard certain files -nH - This option will force wget not to create host.com directory RE: FTP download directories - Larz60+ - Mar-15-2017 wget list of security vulnerabilities: https://www.cvedetails.com/vulnerability-list/vendor_id-72/product_id-332/GNU-Wget.html Take a look at pysftp: https://pypi.python.org/pypi/pysftp/0.2.9 RE: FTP download directories - Sorkkaelain - Mar-15-2017 Thanks for such a quick replies. And sorry I didn’t realize there was a “python tag” will use in future :P. Pysftp seems like it has exactly what I need. Weird that I did not run into it before, Thanks Larz60+! About the wget I been trying many of those different commands I found on google and none of them works. Might it be that you need to have unix shell for those? The python wget custom lib seems to support only few commands like “Wget.download(url)” and that’s almost it or I am missing something. https://pypi.python.org/pypi/wget RE: FTP download directories - wavic - Mar-15-2017 Wget is available for Windows too. I use it all the time mostly to download websites. Partially RE: FTP download directories - Sorkkaelain - Mar-15-2017 (Mar-15-2017, 01:08 PM)wavic Wrote: Wget is available for Windows too. I use it all the time mostly to download websites. Partially Thanks! Didn't know that RE: FTP download directories - wavic - Mar-15-2017 Mostly tutorials, Python modules documentation and so on. After that, I can read them offline. However, I do not use Windows at all. Don't know how to install it on that system RE: FTP download directories - Sorkkaelain - Mar-15-2017 I tried this with the pysftp but ended up getting a lot of errors still. I did install crypto and paramiko and pysftp with pip install comand from CMD import pysftp as sftp def get_file_from_server(): print("connecting") s = sftp.Connection(host='ftp.debian.org', username='anonymous', password='anonymous') print("logged in..") local_path = "C:/test/" remote_path = "/debian/doc/" print("downloading") s.get(remote_path, local_path) print("download complete. closing") s.close() get_file_from_server()"You will need to explicitly load HostKeys (cnopts.hostkeys.load(filename)) or disableHostKey checking (cnopts.hostkeys = None)." "Warning pysftp\__init__.py", line 61 warnings.warn(wmsg, UserWarning)" "FTP_connect.py", line 5, in get_file_from_server" "__init__.py", line 71, in get_hostkey raise SSHException("No hostkey for host %s found." % host) paramiko.ssh_exception.SSHException: No hostkey for host ftp.debian.org found." The wget method seems to work thought RE: FTP download directories - snippsat - Mar-15-2017 Not all sites work with pysftp. Can write something with ftplib. from ftplib import FTP import os ftp = FTP('ftp.debian.org') ftp.login() ftp.cwd('debian') ftp.cwd('doc') # get filenames within the directory filenames = ftp.nlst() print(filenames) # download the files for file in filenames: if '.txt' in file: file_name = os.path.join(r"E:/div", file) lf = open(file_name, "wb") ftp.retrbinary("RETR " + file, lf.write) lf.close()Work almost there problem with 1 file in directory that stall download. So can write some different. All files okay,not recursive for sub-folders that can be a fine traning task to write. from bs4 import BeautifulSoup import requests from urllib.request import urlretrieve url = 'http://ftp.debian.org/debian/doc/' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'lxml') data = soup.find_all('a') for name in data: if 'txt' in name.get('href'): urlretrieve(url+name.get('href'), name.get('href')) |