Python Forum

Full Version: how to check for file type in a folder
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello, I want the script to go through the entire folder and only list the files which are neither Zip or Rar files

but when I use this code, it just goes through the entire folder listing all the files, what am i doing wrong?

import zipfile, os, rarfile, unicodedata

from rarfile import RarFile
rootFolder = u"C:/Users/user/Desktop/archives/"

from zipfile import ZipFile
rootFolder = u"C:/Users/user/Desktop/archives/"

zipfiles = [os.path.join(rootFolder, f) for f in os.listdir(rootFolder)]
[print(i) for i in zipfiles if not isinstance(i, ZipFile) and not isinstance(i, RarFile)]
First you create rootFolder twice, which is not the problem.
The last line just print all elements, because they are not an instance of ZipFile nor RarFile.
They are all strings.

zipfiles = [os.path.join(rootFolder, f) for f in os.listdir(rootFolder) if f.endswith('.rar') or f.endswith('.zip')]
This should give you a list with strings, where only strings are inside which ends with .rar or .zip.
This makes your comprehension a little bit long. You can use a function to decide if an element is added or using multiline.

Multiline example:
zipfiles = [
    os.path.join(rootFolder, f) for f in os.listdir(rootFolder)
    if f.endswith('.rar') or f.endswith('.zip')
    ]
Or with a decider function:
def is_archive(file):
    register = ('.rar', '.zip')
    return any(file.endswith(ftype) for ftype in register)


zipfiles = [os.path.join(rootFolder, f) for f in os.listdir(rootFolder) if is_archive(f)]
Another approach can be the use of pathlib in combimation with glob.

from pathlib import Path


archive_folder = Path('your path')
rar_archives = list(archive_folder.glob('**/*.rar')
zip_archives = list(archive_folder.glob('**/*.zip')
In the lists *_archive are Paths stored. There are some functions/modules, which can't handle Path objects.
In this case, you can convert the Path object with str(your_path_element) to a str.
The benefit of globbing is, that you only get the matching files.

The '**/*.rar' means, that also subdirectories are included.
https://docs.python.org/3/library/pathli....Path.glob
ZipFile is a class for reading or writing zip files. Your zipfiles list is a list of strings that are file paths. They're instances of str, not ZipFile. I would normally do this by checking the extension of the paths in zipfiles (if i[-3:] not in ('zip', 'rar')). If you are worried that some have the wrong extension, you would have to open them up iwht the ZipFile and RarFile classes, and see if there are errors trying to read them.

Edit: Dead_Eye beat me with a much better explanation.
(Sep-15-2018, 01:26 PM)DeaD_EyE Wrote: [ -> ]First you create rootFolder twice, which is not the problem.
The last line just print all elements, because they are not an instance of ZipFile nor RarFile.
They are all strings.

zipfiles = [os.path.join(rootFolder, f) for f in os.listdir(rootFolder) if f.endswith('.rar') or f.endswith('.zip')]
This should give you a list with strings, where only strings are inside which ends with .rar or .zip.
This makes your comprehension a little bit long. You can use a function to decide if an element is added or using multiline.

Multiline example:
zipfiles = [
    os.path.join(rootFolder, f) for f in os.listdir(rootFolder)
    if f.endswith('.rar') or f.endswith('.zip')
    ]
Or with a decider function:
def is_archive(file):
    register = ('.rar', '.zip')
    return any(file.endswith(ftype) for ftype in register)


zipfiles = [os.path.join(rootFolder, f) for f in os.listdir(rootFolder) if is_archive(f)]
Another approach can be the use of pathlib in combimation with glob.

from pathlib import Path


archive_folder = Path('your path')
rar_archives = list(archive_folder.glob('**/*.rar')
zip_archives = list(archive_folder.glob('**/*.zip')
In the lists *_archive are Paths stored. There are some functions/modules, which can't handle Path objects.
In this case, you can convert the Path object with str(your_path_element) to a str.
The benefit of globbing is, that you only get the matching files.

The '**/*.rar' means, that also subdirectories are included.
https://docs.python.org/3/library/pathli....Path.glob

not possible the entirety of the folder extensions are as follows: .0, .1, .2, .3,... until .1999
Then you will have to do what I said. Try to open each one in turn (with ZipFile and RarFile), and see if you get an error. If you don't get any errors, add it to your list.