Python Forum
Counting the number of files related to particular format - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Counting the number of files related to particular format (/thread-13882.html)



Counting the number of files related to particular format - ambush - Nov-05-2018

In a folder there are different file formats such as jpeg,jpg,pdf and psd. Following program can count the files in a particular format.If the folder gets a new file format then a new function needs to be added.So like adding a new function for every file format how to write a more generalized program for any file types.
if the files are added dynamically


import os 
filter = []
for filepath,dir,filenames in os.walk(r'C:\Users\sai\Desktop\dosc'):
		for element in filenames:
			filter.append(element)



newstr = filter
print(len(newstr))


def jpegfilter(x):
	jpeglist =[]
	for line in x:
		if line.endswith('.jpeg') | line.endswith('.jpg') | line.endswith('.JPG'):
			jpeglist.append(line)

	return jpeglist

def pdffilter(x):
	pdflist = []
	for line in x:
		if line.endswith('.pdf'):
			pdflist.append(line)

	return pdflist

def psdfilter(x):
	psdlist = []
	for line in x:
		if line.endswith('.psd'):
			psdlist.append(line)

	return psdlist

def docfilter(x):
	doclist = []
	for line in x:
		if line.endswith('.doc'):
			doclist.append(line)

	return doclist


print("number of jpeg files-{}".format(len(jpegfilter(newstr))))
print("number of pdf files-{}".format(len(pdffilter(newstr))))
print("number of doc files-{}".format(len(docfilter(newstr))))
print("number of doc files-{}".format(len(psdfilter(newstr))))
print("Total number of files-{}".format( len(jpegfilter(newstr)) + len(pdffilter(newstr)) + len(docfilter(newstr)) + len(psdfilter(newstr))) )



RE: Counting the number of files related to particular format - gontajones - Nov-05-2018

You can use glob module.
You could just pass a list of extensions and glob will do the rest.

import glob
img_files = []
for idx, img_ext in enumerate(["*.jpeg", "*.jpg", "*.JPG"]):
    img_files.extend(glob.glob(img_ext))



RE: Counting the number of files related to particular format - Gribouillis - Nov-05-2018

You can use a dictionary and a Counter
from collections import Counter
import os

keymap = {}

def set_key(key, *extensions):
    for ext in extensions:
        keymap[ext] = key

set_key('jpg', '.jpg', '.jpeg', '.JPG')
set_key('pdf', '.pdf')
set_key('psd', '.psd')
set_key('doc', '.doc')


counter = Counter()
for root, dirs, files in os.walk(r'C:\Users\sai\Desktop\dosc'):
    for f in files:
        ext = os.path.splitext(f)[-1]
        if ext in keymap:
            counter.update((keymap[ext],))
 
print(counter)



RE: Counting the number of files related to particular format - buran - Nov-05-2018

use defaultdict from collections module

from collections import defaultdict
import os
all_files = defaultdict(list)
for root, folders, files in os.walk(r'C:\Users\sai\Desktop\dosc'):
    for item in files:
        ext = os.path.splitext(item)[-1][1:]
        all_files[ext].append(item)

for ext, files in all_files.items():
    print('Number of {} files: {}'.format(ext, len(files)))
That said, let me point some issues with your code
1. filter is built-in function, don't use it as variable name
2. no need to iterate over filenames. use list.extend() method instead
3. no need to create new list (with odd name newstr, which suggest it is a str object). If you want the len of the list with all files - just use it
4. you create a function for each extension/file type. Although it's not efficient to iterate over the all-files list each time, you can at least do it with one generic function. just pass second argument - the file extension(s).
from pathlib import Path
def get_files(all_files, extensions):
   return [item for item in all_files if Path(item).suffix in extensions]

all_files = []
for root, folders, files in os.walk(r'C:\Users\BKolev\Desktop'):
    all_files.extend(files)

jpeg_files = get_files(all_files, ['.jpeg', '.jpg'])
print('Number of jpeg files: {}'.format(len(jpeg_files)))
here I use pathlib module, instead of os.path to show different tools you can use also in the previous snippet