Python Forum
Question about os.walk() behaviour
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Question about os.walk() behaviour
#1
I'm trying to write a small program to calculate how much disk space is used by each directory. It seems to work, except when it hits a directory that my account doesn't have access to. I can understand the reason for that (the OS doesn't allow it), but what puzzles me is that any such directory is returned in the 'dirs' part of the tuple, but not in the 'root' part of the tuple. I'm running this under Windows 7 (64-bit), Python 3.6.1 and Anaconda 4.4.0 (64-bit). I don't know if the same behaviour occurs under Linux, I haven't tried it.

Here's my code. The first routine builds a dictionary with every directory and sub-directory under the directory specified on the command line. There's an entry in the dictionary for each directory and sub-directory, and that entry contains another dictionary containing the directory name, level in the hierarchy of directories, number of sub-directories, number of files, space used (initialized to None), and percentage of total space (also initialized to None). The second routine calculates the space and percentage of total space used by each directory.

import os, sys
from os.path import join, getsize

root = None
tree = None

def getTree(dir):
    tree = {}
    level = dir.count(os.sep)
    for root, dirs, files in os.walk(dir):
        size = sum(getsize(join(root,name)) for name in files)
        tree[root] = {
                "level":root.count(os.sep)-level,
                "files":len(files),
                "dirs":dirs,
                "size":size,
                "total":None,
                "percent":None}
    return tree

def getWeights(tree,dir):
    if tree[dir]["total"] == None:
        tree[dir]["percent"] = 100
        total = tree[dir]["size"]
        for subdir in tree[dir]["dirs"]:
            total += getWeights(tree,dir+os.sep+subdir)
        tree[dir]["total"] = total
        for subdir in tree[dir]["dirs"]:
            if dir+os.sep+subdir in tree:
                if tree[dir]["total"] > 0:
                    tree[dir+os.sep+subdir]["percent"] = 100.0 * \
                        tree[dir+os.sep+subdir]["total"] / tree[dir]["total"]
                else:
                    tree[dir+os.sep+subdir]["percent"] = 100.0
            else:
                tree[dir+os.sep+subdir]["percent"] = 0.0
    return tree[dir]["total"]

if __name__ == "__main__":

    root = sys.argv[1]
    root = root.replace('/',os.sep).replace('\\',os.sep)
    tree = getTree(root)
    total = getWeights(tree,root)
    print("Total space used: %d" % total)
    for dir in tree:
        print(dir,tree[dir])
What happens is that when the second routine reaches a directory to which I don't have access, I get a 'KeyError' because the directory isn't in the dictionary built by the first routine, although it's in the 'dirs' list of the directory above it. Although this seems inconsistent to me (and also a problem, because the program aborts on the KeyError), maybe it's the expected behaviour. I'll have to find some way to avoid doing a dictionary lookup for any directories that I don't have access to (I haven't figured that part out yet). Or maybe it would be easier to use os.scandir() instead, although it may have the same behaviour, I'm not sure.
Reply
#2
You could remove the forbidden subdirectories between the two steps:
def remove_forbidden_subdirs(tree):
    for dir, info in tree.items():
        dirs = info['dirs']
        dirs[:] = [d for d in dirs if os.path.join(dir, d) in tree]
Note that os.path.join(a, b, c) is better than a+os.sep+b+os.sep+c.

It is normal behaviour for os.walk not to show the directories for which your account has no permission. A directory is shown as the root part only if the program is allowed to examine the directory's contents.

Removing these directories from the sum means that your program won't show the actual disk space, but only the disk space that your account is allowed to browse. There can be a huge difference between the two.
Reply
#3
Thanks, I'll try that. I was going to add something like 'if dir in tree:' before the point where I got the KeyError, but maybe your solution is better.
Reply
#4
This is a perfect application for pathlib
see: http://journalpanic.com/pyhi/posts/pathlib-and-ospath/
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Smile Python & MSGraph - Learning to Walk ITMan020324 2 354 Feb-04-2024, 04:37 PM
Last Post: ITMan020324
  logger behaviour setdetnet 1 853 Apr-15-2023, 05:20 AM
Last Post: Gribouillis
  can someone explain this __del__ behaviour? rjdegraff42 1 692 Apr-12-2023, 03:25 PM
Last Post: deanhystad
  Asyncio weird behaviour vugz 2 1,189 Apr-09-2023, 01:48 AM
Last Post: vugz
  Weird behaviour using if statement in python 3.10.8 mikepy 23 3,423 Jan-18-2023, 04:51 PM
Last Post: mikepy
  Generator behaviour bla123bla 2 1,072 Jul-26-2022, 07:30 PM
Last Post: bla123bla
  EasySNMP Walk/BulkWalk pylance 3 2,031 Nov-29-2021, 12:00 PM
Last Post: pylance
  Inconsistent behaviour in output - web scraping Steve 6 2,445 Sep-20-2021, 01:54 AM
Last Post: Larz60+
  Adding to the dictionary inside the for-loop - weird behaviour InputOutput007 5 2,651 Jan-21-2021, 02:21 PM
Last Post: InputOutput007
  Behaviour of 2D array SimonB 6 2,759 Jan-21-2021, 01:29 PM
Last Post: SimonB

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020