Python Forum
Counting the number of files related to particular format
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Counting the number of files related to particular format
#1
In a folder there are different file formats such as jpeg,jpg,pdf and psd. Following program can count the files in a particular format.If the folder gets a new file format then a new function needs to be added.So like adding a new function for every file format how to write a more generalized program for any file types.
if the files are added dynamically


import os 
filter = []
for filepath,dir,filenames in os.walk(r'C:\Users\sai\Desktop\dosc'):
		for element in filenames:
			filter.append(element)



newstr = filter
print(len(newstr))


def jpegfilter(x):
	jpeglist =[]
	for line in x:
		if line.endswith('.jpeg') | line.endswith('.jpg') | line.endswith('.JPG'):
			jpeglist.append(line)

	return jpeglist

def pdffilter(x):
	pdflist = []
	for line in x:
		if line.endswith('.pdf'):
			pdflist.append(line)

	return pdflist

def psdfilter(x):
	psdlist = []
	for line in x:
		if line.endswith('.psd'):
			psdlist.append(line)

	return psdlist

def docfilter(x):
	doclist = []
	for line in x:
		if line.endswith('.doc'):
			doclist.append(line)

	return doclist


print("number of jpeg files-{}".format(len(jpegfilter(newstr))))
print("number of pdf files-{}".format(len(pdffilter(newstr))))
print("number of doc files-{}".format(len(docfilter(newstr))))
print("number of doc files-{}".format(len(psdfilter(newstr))))
print("Total number of files-{}".format( len(jpegfilter(newstr)) + len(pdffilter(newstr)) + len(docfilter(newstr)) + len(psdfilter(newstr))) )
Reply
#2
You can use glob module.
You could just pass a list of extensions and glob will do the rest.

import glob
img_files = []
for idx, img_ext in enumerate(["*.jpeg", "*.jpg", "*.JPG"]):
    img_files.extend(glob.glob(img_ext))
Reply
#3
You can use a dictionary and a Counter
from collections import Counter
import os

keymap = {}

def set_key(key, *extensions):
    for ext in extensions:
        keymap[ext] = key

set_key('jpg', '.jpg', '.jpeg', '.JPG')
set_key('pdf', '.pdf')
set_key('psd', '.psd')
set_key('doc', '.doc')


counter = Counter()
for root, dirs, files in os.walk(r'C:\Users\sai\Desktop\dosc'):
    for f in files:
        ext = os.path.splitext(f)[-1]
        if ext in keymap:
            counter.update((keymap[ext],))
 
print(counter)
Reply
#4
use defaultdict from collections module

from collections import defaultdict
import os
all_files = defaultdict(list)
for root, folders, files in os.walk(r'C:\Users\sai\Desktop\dosc'):
    for item in files:
        ext = os.path.splitext(item)[-1][1:]
        all_files[ext].append(item)

for ext, files in all_files.items():
    print('Number of {} files: {}'.format(ext, len(files)))
That said, let me point some issues with your code
1. filter is built-in function, don't use it as variable name
2. no need to iterate over filenames. use list.extend() method instead
3. no need to create new list (with odd name newstr, which suggest it is a str object). If you want the len of the list with all files - just use it
4. you create a function for each extension/file type. Although it's not efficient to iterate over the all-files list each time, you can at least do it with one generic function. just pass second argument - the file extension(s).
from pathlib import Path
def get_files(all_files, extensions):
   return [item for item in all_files if Path(item).suffix in extensions]

all_files = []
for root, folders, files in os.walk(r'C:\Users\BKolev\Desktop'):
    all_files.extend(files)

jpeg_files = get_files(all_files, ['.jpeg', '.jpg'])
print('Number of jpeg files: {}'.format(len(jpeg_files)))
here I use pathlib module, instead of os.path to show different tools you can use also in the previous snippet
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Read directory listing of files and parse out the highest number? cubangt 5 2,254 Sep-28-2022, 10:15 PM
Last Post: Larz60+
Photo How can I use 2 digits format for all the number? plumberpy 6 2,300 Aug-09-2021, 02:16 PM
Last Post: plumberpy
  Counting number of words and organize for the bigger frequencies to the small ones. valeriorsneto 1 1,648 Feb-05-2021, 03:49 PM
Last Post: perfringo
  Although this is a talib related Q it's mostly related to python module installing.. Evalias123 4 5,585 Jan-10-2021, 11:39 PM
Last Post: Evalias123
  Counting Number of Element in a smart way quest 2 1,957 Nov-09-2020, 10:24 PM
Last Post: quest
  multiple number format conversion oli_action 4 2,532 Aug-11-2020, 05:10 AM
Last Post: perfringo
  get method not counting number of strings in dictionary LearningTocode 2 2,049 Jun-13-2020, 11:17 PM
Last Post: LearningTocode
  counting items in a list of number combinations Dixon 2 2,031 Feb-19-2020, 07:06 PM
Last Post: Dixon
  Number format william888 3 2,792 Aug-23-2019, 05:33 AM
Last Post: ThomasL
  Counting number of occurrences of a single digit in a list python_newbie09 12 5,382 Aug-12-2019, 01:31 PM
Last Post: perfringo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020