Python Forum
Search for duplicated files
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Search for duplicated files
#11
os.walk is using os.scandir. At least in Python 3.6.
However, I have tried to make it faster using concurrent.futures.ProcessPoolExecutor to run it on all cores but don't know how much. I was thinking that the md5 hashing is going to be a CPU heavy operation but it turns out that it's not. On my old laptop. It depends on the disk performance.
I have changed from md5 to sha1 because of the possibility of equal hash sums from different files. It's minimal but still... I may change it again to blake2b. If it turns out that the CPU is doing well I will try to use asyncio instead. I have tried already but without success. This library is making my head to explode. Also, will try to take pieces of the file for hashing, not the whole file. According to their website, blake2b can do 1GB per second.
I did a few more changes. If I run the script to scan my /home dir it exits with errors if I am using the web browser at the same time for example. Which is normal because of the .chache directory. I am ignoring those errors but seems it's better to make it skip the whole dir.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Messages In This Thread
Search for duplicated files - by wavic - Oct-02-2017, 01:28 AM
RE: Search for duplicated files - by wavic - Oct-02-2017, 04:47 PM
RE: Search for duplicated files - by wavic - Oct-04-2017, 07:58 AM
RE: Search for duplicated files - by DeaD_EyE - Oct-04-2017, 08:44 AM
RE: Search for duplicated files - by wavic - Oct-04-2017, 08:59 AM
RE: Search for duplicated files - by DeaD_EyE - Oct-04-2017, 09:01 AM
RE: Search for duplicated files - by wavic - Oct-04-2017, 09:54 AM
RE: Search for duplicated files - by hbknjr - Oct-11-2017, 05:01 PM
RE: Search for duplicated files - by wavic - Oct-12-2017, 03:06 PM
RE: Search for duplicated files - by hbknjr - Oct-12-2017, 03:43 PM
RE: Search for duplicated files - by wavic - Oct-12-2017, 11:54 PM
RE: Search for duplicated files - by hbknjr - Oct-13-2017, 07:22 AM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020