FTP download directories

Sorkkaelain · (This post was last modified: Mar-15-2017, 12:02 AM by Larz60+.)

I am trying to make this script to automate some of my daily FTP downloading at work. I am not programmer but I been practicing Python for a while now. Currently I can download single files but cant figure out how to download whole directory.

For testing purposes I have been trying to downloading
ftp.debian.org/debian/doc dir

Could someone point me to the right direction?

My current code is:

#FTP
import ftplib
import wget

site_address = input('please enter FTP address: ')

with ftplib.FTP(site_address) as ftp:
    ftp.login()
    print(ftp.getwelcome())
    print('Current directory', ftp.pwd())
    ftp.dir()
    next_dir = input('Where do you want to go next? ')
    ftp.cwd(next_dir)
    print('Current directory', ftp.pwd())
    ftp.dir()
    download = input('what file would you like to download? ')


    ftp.retrbinary('RETR ' + download, open(download, 'wb').write)
    print('File download succesful')
    ftp.quit()
    print('goodbye')

**Larz60+** · (This post was last modified: Mar-15-2017, 12:04 AM by Larz60+.)

Surround your code with "python tags"
not quote ... fixed your post

wavic · (This post was last modified: Mar-15-2017, 03:29 AM by wavic.)

I suggest you use wget instead.

# download the whole directory 

wget -rpk -l 10 -np -c --random-wait -U Mozilla http://ftp.debian.org/debian/doc

r - recursive download
p - gets not only the HTML files
k - convert the links in a way to be able to use the web pages locally
l - subdirectories level
np - --no-parent directories
c - continue if case network failure or something else. You can run the command again and it will not get the downloaded files again
U - user agent
random-wait - wait until next request

You can provide the user if login is needed. Add --user=username --ask-password to the options. Do not use --password="" or similar as ftp://user:[email protected]/dir address because it will be in cmd history
You may add -R html htm to discard certain files
-nH - This option will force wget not to create host.com directory

**Larz60+** · (This post was last modified: Mar-15-2017, 05:57 AM by Larz60+.)

wget list of security vulnerabilities: https://www.cvedetails.com/vulnerability...-Wget.html

Take a look at pysftp: https://pypi.python.org/pypi/pysftp/0.2.9

Sorkkaelain · Mar-15-2017, 12:54 PM

Thanks for such a quick replies. And sorry I didn’t realize there was a “python tag” will use in future :P.

Pysftp seems like it has exactly what I need. Weird that I did not run into it before, Thanks Larz60+!

About the wget I been trying many of those different commands I found on google and none of them works. Might it be that you need to have unix shell for those? The python wget custom lib seems to support only few commands like “Wget.download(url)” and that’s almost it or I am missing something.

https://pypi.python.org/pypi/wget

wavic · Mar-15-2017, 01:08 PM

Wget is available for Windows too. I use it all the time mostly to download websites. Partially

Sorkkaelain · Mar-15-2017, 02:40 PM

(Mar-15-2017, 01:08 PM)wavic Wrote: Wget is available for Windows too. I use it all the time mostly to download websites. Partially

Thanks! Didn't know that

wavic · Mar-15-2017, 03:54 PM

Mostly tutorials, Python modules documentation and so on. After that, I can read them offline. However, I do not use Windows at all. Don't know how to install it on that system

Sorkkaelain · (This post was last modified: Mar-15-2017, 05:24 PM by Sorkkaelain.)

I tried this with the pysftp but ended up getting a lot of errors still. I did install crypto and paramiko and pysftp with pip install comand from CMD

import pysftp as sftp

def get_file_from_server():
    print("connecting")
    s = sftp.Connection(host='ftp.debian.org', username='anonymous', password='anonymous')
    print("logged in..")
    local_path = "C:/test/"
    remote_path = "/debian/doc/"
    print("downloading")
    s.get(remote_path, local_path)
    print("download complete. closing")
    s.close()

get_file_from_server()

"You will need to explicitly load HostKeys (cnopts.hostkeys.load(filename)) or disableHostKey checking (cnopts.hostkeys = None)."
"Warning pysftp\__init__.py", line 61 warnings.warn(wmsg, UserWarning)"
"FTP_connect.py", line 5, in get_file_from_server"
"__init__.py", line 71, in get_hostkey
raise SSHException("No hostkey for host %s found." % host)
paramiko.ssh_exception.SSHException: No hostkey for host ftp.debian.org found."

The wget method seems to work thought

***snippsat*** · (This post was last modified: Mar-15-2017, 09:24 PM by snippsat.)

Not all sites work with pysftp.

Can write something with ftplib.

from ftplib import FTP
import os

ftp = FTP('ftp.debian.org')
ftp.login()
ftp.cwd('debian')
ftp.cwd('doc')
# get filenames within the directory
filenames = ftp.nlst()
print(filenames)

# download the files
for file in filenames:
   if '.txt' in file:
       file_name = os.path.join(r"E:/div", file)
       lf = open(file_name, "wb")
       ftp.retrbinary("RETR " + file, lf.write)
lf.close()

Work almost there problem with 1 file in directory that stall download.
So can write some different.
All files okay,not recursive for sub-folders that can be a fine traning task to write.

from bs4 import BeautifulSoup
import requests
from urllib.request import urlretrieve

url = 'http://ftp.debian.org/debian/doc/'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml')
data = soup.find_all('a')
for name in data:
    if 'txt' in name.get('href'):
        urlretrieve(url+name.get('href'), name.get('href'))

FTP download directories

User Panel Messages

Announcements