Python Forum
Search for duplicated files
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Search for duplicated files
#12
Tried blake2b but got better results with xxhash.

I found out that Blocksize is of much significance. Overall got better results(with larger files) when used 2**17(131072 bytes) instead of 1mb.

Using concurrent.futures worsened the execution time and multiprocessing.Pool seemed to have no effect, idk may it has to do something with how hashes are calculated.

My current updated script took 610 seconds(10 minutes) to find 14170 duplicates in 53,491 Files and 7,988 Folders of total 165 GB.

Now I'll try to change block/buffer size according to the file size but i don't know the exact effect buffer size have on the hashing speed in accordance to filesize.
Reply


Messages In This Thread
Search for duplicated files - by wavic - Oct-02-2017, 01:28 AM
RE: Search for duplicated files - by wavic - Oct-02-2017, 04:47 PM
RE: Search for duplicated files - by wavic - Oct-04-2017, 07:58 AM
RE: Search for duplicated files - by DeaD_EyE - Oct-04-2017, 08:44 AM
RE: Search for duplicated files - by wavic - Oct-04-2017, 08:59 AM
RE: Search for duplicated files - by DeaD_EyE - Oct-04-2017, 09:01 AM
RE: Search for duplicated files - by wavic - Oct-04-2017, 09:54 AM
RE: Search for duplicated files - by hbknjr - Oct-11-2017, 05:01 PM
RE: Search for duplicated files - by wavic - Oct-12-2017, 03:06 PM
RE: Search for duplicated files - by hbknjr - Oct-12-2017, 03:43 PM
RE: Search for duplicated files - by wavic - Oct-12-2017, 11:54 PM
RE: Search for duplicated files - by hbknjr - Oct-13-2017, 07:22 AM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020