Python Forum
a future project: hardlink identical files
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
a future project: hardlink identical files
#3
(Dec-16-2017, 08:48 PM)ezdev Wrote: one reason i wouldnt use this, is that you dont know which file will be the link and which is the real file.

certainly you can check if a file is a link before you remove it. but if you dont do that every time, you could find yourself deleting the one real file and then the others are no good either.

so what youre really doing is making it so every time you delete a file, you have to check that no links are relying on it. this is always true of links, but by increasing the number of links to "any files that have duplicates" you nearly defeat the purpose of having duplicates in the first place. i suppose in a few instances this would be worthwhile.

this is about hard links, not symbolic links.   both being linked are real files.  if they are already linked, then it would be an optimization to not try linking them if they can be detected as already linked by not doing any more syscalls in O(1) time.  this kind of linking just makes 2 paths (names) that previously referenced (pointed to, or linked to) different inodes now reference just one of those inodes that have been compared and found to be identical.

usually, hardlinking two identical files won't matter.  but there are some odd cases to watch out for.  1, is if checking that metadata is identical, such as the timestamp, or the owner, id not allowed, the hardlinking step can result in a given file path effectively changing metadata. one metadata that would be wrong to compare is that reference to th inode, the inode number.  2, similarly, and perhaps more confusing to many, is if the file paths being linked already have other links out of the scope this run will be looking at.  but this all falls under the warning to not attempt this kind of compaction where link relations matter, such as a file designated to be where in-place changes are made where other linked pathes are expected to see the changes (or in special cases, not see them).  3, if the comparison compares two files and one of them is really a symlink.  this is really a metadata detection that should never be allowed to be disabled.  if both paths are already symlinks to different files, hardlinking the two symlinks (yes, you can do that in POSIX, BSD, and Linux) might seem right, but, can leave a dangling file, depending on how things are referenced in actual usage.

so a good version of this program would allow specifying what to do in the odd cases, and detecting cases of trouble.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Messages In This Thread
RE: a future project: hardlink identical files - by Skaperen - Dec-17-2017, 04:28 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  PEP 572 and Future svetlanarosemond 8 5,547 Jul-16-2018, 08:40 PM
Last Post: micseydel
  a future project for some day Skaperen 7 4,247 Apr-24-2018, 03:12 AM
Last Post: Skaperen
  a future project: recursive file list Skaperen 0 2,291 Dec-14-2017, 03:55 AM
Last Post: Skaperen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020