Jan-11-2017, 02:10 PM
Hi everyone.
I am running FreeFileSync app on PC clients to sync/backup Users folders to network file server, with versioning turned on.
System is Windows 7 64-bit with Python 3.5.
Versioning does following: If synced file is new, then old file is moved into special folder, and then new file is synced/copied. Problem arises because FreeFileSync does not have option to limit the number of versions kept, so with large files (2GB+) that are changed daily like Outlook.pst, Thunderbird msf files, etc, HDD is filled in just few days/weeks.
So I decided to create a Python script that will go through Versioning locations/folders, separate/filter versions of each file, and delete any but just 2 last files. Just deleting anything older then (current date - x days) is not viable because if some file has not changed in that time, all versions (but current backup) will be deleted.
Now we come to the point where I am stuck. I have following format of file names in folders I need to parse:
Notice that versioning file name has original filename with ext, then space, then YYYY-MM-DD date, space, then HHmmss and then .ext of original file.
What I need to do is to create a "key" definition for sorted() to find and separate all files from separate original files (like archive.pst, Outlook.pst, Outlook.sharing.xml.obi)
I think best option to separate/recognize them is to recognize ext of the file from end of the file (.pst, .obi) and to locate that substring in the rest of the file. Then, using sublist of files belonging to original file, to sort them by datetime (newest first?) from filename (date of the file might be different then in filename if something goes wrong!), eliminate those two (first?) files from the list (copy the rest into new resulting list?) so I can delete all extra files (on that resulting? list) in next step, leaving only two latest files.
Problem is that I am new with Python and all of this is WAY above my understanding, and HDD is already pretty full so I am deleting huge files manually.
Code to replicate is like this:
I am running FreeFileSync app on PC clients to sync/backup Users folders to network file server, with versioning turned on.
System is Windows 7 64-bit with Python 3.5.
Versioning does following: If synced file is new, then old file is moved into special folder, and then new file is synced/copied. Problem arises because FreeFileSync does not have option to limit the number of versions kept, so with large files (2GB+) that are changed daily like Outlook.pst, Thunderbird msf files, etc, HDD is filled in just few days/weeks.
So I decided to create a Python script that will go through Versioning locations/folders, separate/filter versions of each file, and delete any but just 2 last files. Just deleting anything older then (current date - x days) is not viable because if some file has not changed in that time, all versions (but current backup) will be deleted.
Now we come to the point where I am stuck. I have following format of file names in folders I need to parse:
Quote:archive.pst 2016-10-14 080101.pst
archive.pst 2016-10-15 080101.pst
archive.pst 2016-10-17 080101.pst
archive.pst 2016-10-18 080101.pst
archive.pst 2016-10-19 080101.pst
archive.pst 2016-10-20 080101.pst
Outlook.pst 2016-10-14 080101.pst
Outlook.pst 2016-10-15 080101.pst
Outlook.pst 2016-10-17 080101.pst
Outlook.pst 2016-10-18 080101.pst
Outlook.pst 2016-10-19 080101.pst
Outlook.pst 2016-10-20 080101.pst
Outlook.sharing.xml.obi 2016-10-14 080101.obi
Outlook.sharing.xml.obi 2016-10-15 080101.obi
Outlook.sharing.xml.obi 2016-10-17 080101.obi
Outlook.sharing.xml.obi 2016-10-18 080101.obi
Outlook.sharing.xml.obi 2016-10-19 080101.obi
Outlook.sharing.xml.obi 2016-10-20 080101.obi
Notice that versioning file name has original filename with ext, then space, then YYYY-MM-DD date, space, then HHmmss and then .ext of original file.
What I need to do is to create a "key" definition for sorted() to find and separate all files from separate original files (like archive.pst, Outlook.pst, Outlook.sharing.xml.obi)
I think best option to separate/recognize them is to recognize ext of the file from end of the file (.pst, .obi) and to locate that substring in the rest of the file. Then, using sublist of files belonging to original file, to sort them by datetime (newest first?) from filename (date of the file might be different then in filename if something goes wrong!), eliminate those two (first?) files from the list (copy the rest into new resulting list?) so I can delete all extra files (on that resulting? list) in next step, leaving only two latest files.
Problem is that I am new with Python and all of this is WAY above my understanding, and HDD is already pretty full so I am deleting huge files manually.
Code to replicate is like this:
def ljfilter(a): # def code here return MyDir = 'c:/test' os.chdir(MyDir) lista = [f for f in listdir(MyDir) if isfile(join(MyDir, f))] for item in sorted(lista, key=ljfilter): print(item)