Jan-07-2018, 01:57 PM
I'm trying to write a small program to calculate how much disk space is used by each directory. It seems to work, except when it hits a directory that my account doesn't have access to. I can understand the reason for that (the OS doesn't allow it), but what puzzles me is that any such directory is returned in the 'dirs' part of the tuple, but not in the 'root' part of the tuple. I'm running this under Windows 7 (64-bit), Python 3.6.1 and Anaconda 4.4.0 (64-bit). I don't know if the same behaviour occurs under Linux, I haven't tried it.
Here's my code. The first routine builds a dictionary with every directory and sub-directory under the directory specified on the command line. There's an entry in the dictionary for each directory and sub-directory, and that entry contains another dictionary containing the directory name, level in the hierarchy of directories, number of sub-directories, number of files, space used (initialized to None), and percentage of total space (also initialized to None). The second routine calculates the space and percentage of total space used by each directory.
What happens is that when the second routine reaches a directory to which I don't have access, I get a 'KeyError' because the directory isn't in the dictionary built by the first routine, although it's in the 'dirs' list of the directory above it. Although this seems inconsistent to me (and also a problem, because the program aborts on the KeyError), maybe it's the expected behaviour. I'll have to find some way to avoid doing a dictionary lookup for any directories that I don't have access to (I haven't figured that part out yet). Or maybe it would be easier to use os.scandir() instead, although it may have the same behaviour, I'm not sure.
Here's my code. The first routine builds a dictionary with every directory and sub-directory under the directory specified on the command line. There's an entry in the dictionary for each directory and sub-directory, and that entry contains another dictionary containing the directory name, level in the hierarchy of directories, number of sub-directories, number of files, space used (initialized to None), and percentage of total space (also initialized to None). The second routine calculates the space and percentage of total space used by each directory.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
import os, sys from os.path import join, getsize root = None tree = None def getTree( dir ): tree = {} level = dir .count(os.sep) for root, dirs, files in os.walk( dir ): size = sum (getsize(join(root,name)) for name in files) tree[root] = { "level" :root.count(os.sep) - level, "files" : len (files), "dirs" :dirs, "size" :size, "total" : None , "percent" : None } return tree def getWeights(tree, dir ): if tree[ dir ][ "total" ] = = None : tree[ dir ][ "percent" ] = 100 total = tree[ dir ][ "size" ] for subdir in tree[ dir ][ "dirs" ]: total + = getWeights(tree, dir + os.sep + subdir) tree[ dir ][ "total" ] = total for subdir in tree[ dir ][ "dirs" ]: if dir + os.sep + subdir in tree: if tree[ dir ][ "total" ] > 0 : tree[ dir + os.sep + subdir][ "percent" ] = 100.0 * \ tree[ dir + os.sep + subdir][ "total" ] / tree[ dir ][ "total" ] else : tree[ dir + os.sep + subdir][ "percent" ] = 100.0 else : tree[ dir + os.sep + subdir][ "percent" ] = 0.0 return tree[ dir ][ "total" ] if __name__ = = "__main__" : root = sys.argv[ 1 ] root = root.replace( '/' ,os.sep).replace( '\\' ,os.sep) tree = getTree(root) total = getWeights(tree,root) print ( "Total space used: %d" % total) for dir in tree: print ( dir ,tree[ dir ]) |