Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Watch SFTP Folder
#1


Hi Guys,

I'm sure you will be able to tell from reading this, but I am quite new to python :)

What I'm trying to do is connect to a SFTP server to download files - the files are not always available at the same time each day so I'd like to connect at 2pm and just stay connected polling the directory until all the files are available and then download them.

I have no issues downloading the files if they exist when the script is executed, but because of the delays sometimes they are not there and I'll end up needing to execute it again.

I get run a query using cx_oracle to and put the results into a list - these are all of the filenames that are expected/required to be downloaded for example [file1.csv, file2.csv etc]

The part I'm stuck on / not sure how to do is making the script stay connected to the server until all files are available.

I've got the code below to work on a local folder to tell me when files are added / removed, so I was thinking I might be able to adapt this somehow to just create a list of files that have been added since my connection was made and then I would just somehow compare the list to my expected list that (I mentioned above) somehow?

import os, time
path_to_watch = "."
before = dict ([(f, None) for f in os.listdir (path_to_watch)])
while 1:
  time.sleep (10)
  after = dict ([(f, None) for f in os.listdir (path_to_watch)])
  added = [f for f in after if not f in before]
  removed = [f for f in before if not f in after]
  if added: print "Added: ", ", ".join (added)
  if removed: print "Removed: ", ", ".join (removed)
  before = after

Hoping someone is kind enough to give me some pointers / some sample code that I may be able to adapt...


Many thanks in advance!
Quote
#2
It depends on what package you use to connect to SFTP. For example pysftp has 
pysftp.Connection.listdir()

also you may want to use watchdog to monitor for file system changes

here is an example code for sync local and remote (sftp server).

https://codereview.stackexchange.com/que...local-ones



EDIT: I see the example is the other way sync from local to remote and you want the opposite
just google watchdog python ftp
Quote
#3
Thanks for the reply Buran...
I ended up getting it to work a different way - I'm sure there is a much better way to do this, but it is working the way I need so it'll do for now until I learn more Big Grin

            t0 = time.time()
            while total_files != total_files_required:
                time.sleep(10)
                remote_files = sftp.listdir("REMOTE_DIRECTORY_TO_WATCH")
                logging.info("Robot is currently waiting for the total matched filenames to = the total expected matches, current number of matches is: " + str(total_files))
                for filename in expected_files:
                    t1 = time.time()
                    logging.info("Looking for: " + filename + " in the remote directory.")
                    total_time = t1-t0
                    if total_time == 7200:
                        logging.warning("Robot has been waiting for all of the expected extract files for 2 hours & has now aborted.")
                        update_job_log_table("subjet for email notification", "job status", "email template to use", job number)
                        sys.exit(1)
                    if filename + SUFFIX_TO_FETCH in remote_files:
                        logging.info("I've found filename: " + filename + " in the remote directory & will remove it from the list of required files variable so i'm only looking for files that are still missing.")
                        expected_files = [expected_files for expected_files in expected_files if expected_files != filename]
                        total_files = total_files + 1
So basically I have a query before this part which will tell me the filenames that are expected in the remote folder + the count of expected files.
It loops through looking for these files names and if it is found it will +1 to the total_files variable and remove that file from the expected_files list variable and continue the loop looking or what is left in the expected files variable.
Once the total_files variable = the number of expected files in the query it will start the download.
If it has been waiting for 2 hours it will call the update_job_log_table to update a table in a db and send a notification email about the failure and then exit.

The only part I have not tested is the time part to exit after 2 hours, everything else however, works fine Smile

If someone reads this and wants to suggest better ways of doing this I'm more than happy to have a read and give it a shot Big Grin
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Opening CSV file from SFTP server not working cluelessintern 0 286 Apr-08-2020, 08:10 PM
Last Post: cluelessintern
  Watch new files and modify it Macha 3 618 Mar-27-2019, 03:47 PM
Last Post: metulburr
  Delete directories in folder is not working after folder is updated asheru93 2 540 Feb-13-2019, 12:37 PM
Last Post: asheru93
  Connect to SFTP to read cvs files arunlal 1 831 Nov-20-2018, 08:32 AM
Last Post: buran
  SFTP transfer using paramiko fails estebanup03 0 2,102 Sep-06-2018, 08:06 PM
Last Post: estebanup03
  copy content of folder to existing folder shlomi27 0 762 Aug-11-2018, 01:44 PM
Last Post: shlomi27
  get file paramiko sftp Reyneli 0 1,349 Jun-29-2018, 08:25 AM
Last Post: Reyneli
  how to watch for file in folder bowen73 11 10,922 Aug-22-2017, 01:35 PM
Last Post: Larz60+

Forum Jump:


Users browsing this thread: 1 Guest(s)