Find duplicate files in multiple directories

***snippsat*** · Dec-26-2022, 11:15 PM

(Dec-26-2022, 08:38 PM)Pavel_47 Wrote: It's about duplicate files (sure, with the same filenames), located in different folders.
The problem is to get some kind of dataset where each duplicated file (or filename) is associated with locations where this file exist.

Should use hash value for a files.
Rather than identifying the file content bye name,extension,size or other designation which have no guarantees to work.
If use pathlib will give full path(recursively using rglob) where duplicate exists,and hashlib work fine for this.
Exampke.

import hashlib
from pathlib import Path

def compute_hash(file_path):
    with open(file_path, 'rb') as f:
        hash_obj = hashlib.md5()
        hash_obj.update(f.read())
        return hash_obj.hexdigest()

def find_duplicate(root_path):
    hashes = {}
    for file_path in root_path.rglob('*'):
        # print(file_path)
        if file_path.is_file():
            hash_value = compute_hash(file_path)
            if hash_value in hashes:
                print(f'Duplicate file found: {file_path}')
            else:
                hashes[hash_value] = file_path

if __name__ == '__main__':
    root_path = Path(r'G:\div_code')
    find_duplicate(root_path)

Output:Duplicate file found: G:\div_code\test_cs\foo\bar.txt
Duplicate file found: G:\div_code\test_cs\foo\some_folder\egg2.txt

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Trying to generating multiple json files using python script	dzgn989	4	297	May-10-2024, 03:09 PM Last Post: deanhystad
	[SOLVED] Loop through directories and files one level down?	Winfried	3	360	Apr-28-2024, 02:31 PM Last Post: Gribouillis
	Organization of project directories	wotoko	3	533	Mar-02-2024, 03:34 PM Last Post: Larz60+
	python convert multiple files to multiple lists	MCL169	6	1,704	Nov-25-2023, 05:31 AM Last Post: Iqratech
	splitting file into multiple files by searching for string	AlphaInc	2	1,002	Jul-01-2023, 10:35 PM Last Post: Pedroski55
	Merging multiple csv files with same X,Y,Z in each	Auz_Pete	3	1,280	Feb-21-2023, 04:21 AM Last Post: Auz_Pete
	Listing directories (as a text file)	kiwi99	1	893	Feb-17-2023, 12:58 PM Last Post: Larz60+
	unittest generates multiple files for each of my test case, how do I change to 1 file	zsousa	0	1,010	Feb-15-2023, 05:34 PM Last Post: zsousa
	rename same file names in different directories	elnk	0	754	Nov-04-2022, 05:23 PM Last Post: elnk
	Python: re.findall to find multiple instances don't work but search worked	Secret	1	1,280	Aug-30-2022, 08:40 PM Last Post: deanhystad

Find duplicate files in multiple directories

User Panel Messages

Announcements