Python Forum
Find duplicate files in multiple directories
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Find duplicate files in multiple directories
#8
(Dec-26-2022, 08:38 PM)Pavel_47 Wrote: It's about duplicate files (sure, with the same filenames), located in different folders.
The problem is to get some kind of dataset where each duplicated file (or filename) is associated with locations where this file exist.
Should use hash value for a files.
Rather than identifying the file content bye name,extension,size or other designation which have no guarantees to work.
If use pathlib will give full path(recursively using rglob) where duplicate exists,and hashlib work fine for this.
Exampke.
import hashlib
from pathlib import Path

def compute_hash(file_path):
    with open(file_path, 'rb') as f:
        hash_obj = hashlib.md5()
        hash_obj.update(f.read())
        return hash_obj.hexdigest()

def find_duplicate(root_path):
    hashes = {}
    for file_path in root_path.rglob('*'):
        # print(file_path)
        if file_path.is_file():
            hash_value = compute_hash(file_path)
            if hash_value in hashes:
                print(f'Duplicate file found: {file_path}')
            else:
                hashes[hash_value] = file_path

if __name__ == '__main__':
    root_path = Path(r'G:\div_code')
    find_duplicate(root_path)
Output:
Duplicate file found: G:\div_code\test_cs\foo\bar.txt Duplicate file found: G:\div_code\test_cs\foo\some_folder\egg2.txt
Reply


Messages In This Thread
RE: Find duplicate files in multiple directories - by snippsat - Dec-26-2022, 11:15 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Trying to generating multiple json files using python script dzgn989 4 297 May-10-2024, 03:09 PM
Last Post: deanhystad
  [SOLVED] Loop through directories and files one level down? Winfried 3 360 Apr-28-2024, 02:31 PM
Last Post: Gribouillis
  Organization of project directories wotoko 3 533 Mar-02-2024, 03:34 PM
Last Post: Larz60+
  python convert multiple files to multiple lists MCL169 6 1,704 Nov-25-2023, 05:31 AM
Last Post: Iqratech
  splitting file into multiple files by searching for string AlphaInc 2 1,002 Jul-01-2023, 10:35 PM
Last Post: Pedroski55
  Merging multiple csv files with same X,Y,Z in each Auz_Pete 3 1,280 Feb-21-2023, 04:21 AM
Last Post: Auz_Pete
  Listing directories (as a text file) kiwi99 1 893 Feb-17-2023, 12:58 PM
Last Post: Larz60+
  unittest generates multiple files for each of my test case, how do I change to 1 file zsousa 0 1,010 Feb-15-2023, 05:34 PM
Last Post: zsousa
  rename same file names in different directories elnk 0 754 Nov-04-2022, 05:23 PM
Last Post: elnk
  Python: re.findall to find multiple instances don't work but search worked Secret 1 1,280 Aug-30-2022, 08:40 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020