Python Forum
Compare folder A and subfolder B and display files that are in folder A but not in su
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Compare folder A and subfolder B and display files that are in folder A but not in su
#1
Create a code that compares c:\Folder-Oana\extracted\ and c:\Folder-Oana\extracted\translated\ and shows me the files that are in the first folder, but not in the second. So, compare folder A and subfolder B and display files that are in folder A but not in subfolder B. So I have 1880 html files in folder A and only 50 html files in FOLDER B. So, cod must show me each files, the different between 1880 - 50.

This is my code (2 versions). I try with ChatGPT another, more ways, but didn't work. I believe the problem is that there are FOLDER + SUBFOLDER, not different folders with no other subfolders in it.

Version 1

import os

folder1 = r"C:\Folder-Oana\extracted"
folder2 = r"C:\Folder-Oana\extracted\translated"

# Obține lista de fișiere HTML din fiecare folder
html_files_folder1 = [f.lower() for f in os.listdir(folder1) if f.lower().endswith('.html')]
html_files_folder2 = [f.lower() for f in os.listdir(folder2) if f.lower().endswith('.html')]

# Găsește diferențele între cele două liste de fișiere
missing_files = list(set(html_files_folder1) - set(html_files_folder2))

# Afișează fișierele care lipsesc
if missing_files:
    print("Fișierele HTML care se găsesc în folderul 1, dar nu în folderul 2, sunt:")
    for filename in missing_files:
        print(filename)
else:
    print("Nu există fișiere HTML care se găsesc în folderul 1, dar nu în folderul 2.")
Version 2

import os

folder1 = r'C:\Folder-Oana\extracted\translated'
folder2 = r'C:\Folder-Oana\extracted'

# Funcție pentru a returna lista de fișiere HTML dintr-un folder
def get_html_files(folder):
    html_files = []
    for root, dirs, files in os.walk(folder):
        for file in files:
            if file.lower().endswith('.html'):
                html_files.append(file)
    return html_files

# Obține lista de fișiere HTML pentru fiecare folder
html_files_folder1 = get_html_files(folder1)
html_files_folder2 = get_html_files(folder2)

# Verifică fișierele care se găsesc în folderul 1, dar nu în folderul 2
missing_files = [file for file in html_files_folder1 if file not in html_files_folder2]

# Afișează fișierele și folderul corespunzător în care se găsesc
for file in missing_files:
    if file in html_files_folder1:
        print(f"Fișierul {file} se găsește în folderul {folder1}")
    if file in html_files_folder2:
        print(f"Fișierul {file} se găsește în folderul {folder2}")
Reply
#2
for getting list of files, use pathlib instead of os:
from pathlib import Path


home = Path(".")
base = home / "Folder-Oana"

folder1 = base / "extracted"
folder2 = base / "translated"

def get_folder_contents(dirname, filter):
    if filter:
        return [file for file in dirname.iterdir() if file.is_file() and file.suffix == filter]
    else:
        return [file for file in dirname.iterdir() if file.is_file()]

html_files_folder1 = get_folder_contents(dirname=folder1, filter=".html")
html_files_folder2 = get_folder_contents(dirname=folder2, filter=".html")
Note that both paths are sub-directories of Folder-Oana
Reply
#3
In version 2, the get_folder_contents get all files in folder, plus files that are in subdirectories of folder. For obvious reasons this will not work if you want to find out what files are in C:\Folder-Oana\extracted but are not in C:\Folder-Oana\extracted\translated. All files in html_files_folder2 will also be in html_files_folder1 because folder 2 is a subdirectory of folder 1.

I don't know why version 1 wouldn't work. It worked fine for me.
import os

a = set(f.lower() for f in os.listdir(".") if f.lower().endswith(".py"))
b = set(f.lower() for f in os.listdir("./test") if f.lower().endswith(".py"))

print("A or B", *(a | b), sep="\n")
print("", "A and B", *(a & b), sep="\n")
print("", "A but not B", *(a - b), sep="\n")
print("", "B but not A", *(b - a), sep="\n")
Output:
A or B junk.py console.py junk2.py junk3 copy.py junk3.py pythonhighlighter.py interactiveconsole.py sqlite_demo.py monkeypatching.py A and B junk.py junk2.py junk3.py A but not B console.py pythonhighlighter.py interactiveconsole.py sqlite_demo.py monkeypatching.py B but not A junk3 copy.py
There may be a slight problem with Larz60+ code. Files with the extension ".HTML" will not be included in the list because "html" != "HTML".

Stop using "\" and start using "/" for the separator in file paths. Windows accepts "/" and it eliminates the confusion of "\" maybe being the start of an escape sequence.
Reply
#4
Try glob?

import glob
path1 = '/home/pedro/summer2021/**/'
path2 = '/home/pedro/summer2021/EC/*'
all_files1 = glob.glob(path1 + '*.odt')
all_files2 = glob.glob(path2 + '*.odt')
intersect = list(set(all_files1).intersection(set(all_files2))
exceptions = [f for f in all_files1 if not f in intersect]
len(exceptions) #26
len(all_files1) #33
len(intersection) #7
If memory is a problem you can use iglob()

all_files = glob.iglob(path + '*.odt')
iglob returns a generator, not a list.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Deleting Windows temp folder Raysz 7 445 Apr-02-2024, 12:36 PM
Last Post: Raysz
  Help with creating folder and "virtual environment" AudunNilsen 1 247 Mar-21-2024, 04:41 AM
Last Post: deanhystad
Question How to add Python folder in Windows Registry ? Touktouk 1 276 Feb-20-2024, 01:04 PM
Last Post: DeaD_EyE
  Create dual folder on different path/drive based on the date agmoraojr 2 455 Jan-21-2024, 10:02 AM
Last Post: snippsat
  problem in import module from other folder akbarza 5 1,439 Sep-01-2023, 07:48 AM
Last Post: Gribouillis
  Reading a file name fron a folder on my desktop Fiona 4 919 Aug-23-2023, 11:11 AM
Last Post: Axel_Erfurt
  Rename files in a folder named using windows explorer hitoxman 3 752 Aug-02-2023, 04:08 PM
Last Post: deanhystad
  Rename all files in a folder hitoxman 9 1,506 Jun-30-2023, 12:19 AM
Last Post: Pedroski55
  how do I open two instances of visual studio code with the same folder? SuchUmami 3 892 Jun-26-2023, 09:40 AM
Last Post: snippsat
  How i can use categories with folder? AnonymerAlias 2 595 Jun-06-2023, 03:44 PM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020