Python Forum
why i don't like os.walk()
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
why i don't like os.walk()
#1
i have decided that i don't like os.walk(). that means i will be looking to code my own file tree recursion generator. the biggest reason is that i find no means to modify the list of subdirectories that it uses to descend down to the next level, such as sorting them or removing selected subdirectories or deciding which ones are allowed to follow symlinks or descend into a different filesystem or owner space. i have already written this kind of thing in C (where existing tools were similarly limited) so the logic should be easy enough for me to do. fyi, my C recursion code even includes the ability to yield directories when ascending back up to them (e.g when all the descents are complete). this should be even easier in Python than in C since Python already has the means to do generators (which my C code was trying to fake).
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Use pathlib

If you want all directory entries, including symbolic links, etc remove the conditional for dirs
and eliminate the file search but then you will have to look at individual entries to get type.

from pathlib import Path

def plib_walk(dir):
    dirs = [x for x in dir.iterdir() if x.is_dir()]
    files = [x for x in dir.iterdir() if x.is_file()]

    print (f'\nDirectory: {dir.resolve()}')
    for file in files:
        print(f'    file: {file}')

    for pdir  in dirs:
        plib_walk(pdir)

def tryit():
    home = Path('.')
    # Start a level higher to make it interesting
    start = home / '..'
    print(f'starting with: {start.resolve()}')
    plib_walk(start)

if __name__ == '__main__':
    tryit()
If you want to open files to search within, use:
with file.open() as f:
   read(f)
Reply
#3
(Jan-09-2018, 05:50 AM)Skaperen Wrote: i find no means to modify the list of subdirectories that it uses to descend down to the next level, such as sorting them or removing selected subdirectories
In principle, you can do this by changing the contents of the subdirectories list, for example
import os
for root, subdirs, files in os.walk('foo'):
    subdirs[:] = sorted(subdirs) # sorts the subdirs
    subdirs[:] = [d for d in subdirs if not d.startswith('tmp')] # remove some subdirs
    print(root, subdirs)
Reply
#4
(Jan-09-2018, 07:44 AM)Gribouillis Wrote:
(Jan-09-2018, 05:50 AM)Skaperen Wrote: i find no means to modify the list of subdirectories that it uses to descend down to the next level, such as sorting them or removing selected subdirectories
In principle, you can do this by changing the contents of the subdirectories list, for example
import os
for root, subdirs, files in os.walk('foo'):
    subdirs[:] = sorted(subdirs) # sorts the subdirs
    subdirs[:] = [d for d in subdirs if not d.startswith('tmp')] # remove some subdirs
    print(root, subdirs)
so this affects the generator state? how? what does the [:] on the LHS do? can i delete a directory from the directory list to have it not descend that one?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#5
The generator reuses the reference to the list subdirs in subsequent steps. It means that any change of the list's content will be reflected in the subdirs' walk. Look at this code
>>> a = ['spam', 'eggs', 'ham']
>>> b = a
>>> a[:] = sorted(a)
>>> b
['eggs', 'ham', 'spam']
>>> 
>>> 
>>> a = ['spam', 'eggs', 'ham']
>>> b = a
>>> a = sorted(a)
>>> b
['spam', 'eggs', 'ham']
Using a[:] = changes list a in place, while a = doesn't change the initial list.
Reply
#6
os.walk does exactly that what it should do.

If you need a sorted list of files, then sort with sorted.
If you need to exclude files from a list, then write your function for it.
If you want to check for symlinks, do it in the loop with os.path.islink.

silly example:
def exclude_filter(file):
    return True if file.startswith('.') else False

for root, dirs, files in os.walk('.'):
    for file in sorted(files):
        file_path = os.path.join(root, file)
        if os.path.islink(file_path):
            continue
        if exclude_filter(file):
            continue
        print(file_path)
If you need a special kind of os.wlak, which is doing everything, sorry. You have to write your own.
What you want to have, can be in somewhere in special library for this special task.
For example, if you want to sort the traverse by directories, it's getting complicated with this function.

This kind of super special function will never get into the standard library. If you ask 10 people, you will get 40 requests for this implementation.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#7
Well, if I want to traverse my /home/$USER directory and I want to exclude Music and Video I can just remove them from the list:

for root, dirs, files in os.walk('/home/victor/'):
    dirs = [item for item in dirs[:] if item not in ('Music', 'Video)]
It's completely safe to do that and os.walk will not going through these directories.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#8
(Jan-09-2018, 05:09 PM)wavic Wrote: dirs = [item for item in dirs[:] if item not in ('Music', 'Video)]
It works only if you write dirs[:] = .... The list object dirs must remain the same, even if its contents changes.
Reply
#9
so ...

the [:] on the LHS uses a whole list slice assignment to update the existing reference and avoid assigning a new reference. and each dirlist reference i get from each step of the generator i get from os.walk() is kept internal in the generator and subsequently used, as modified in-place, to descend in the tree walk.

(Jan-09-2018, 05:09 PM)wavic Wrote: Well, if I want to traverse my /home/$USER directory and I want to exclude Music and Video I can just remove them from the list:

for root, dirs, files in os.walk('/home/victor/'):
    dirs = [item for item in dirs[:] if item not in ('Music', 'Video)]
It's completely safe to do that and os.walk will not going through these directories.

so ...
for root, dirs, files in os.walk('/home/victor/'):
    dirs[:] = [item for item in dirs if item not in ('Music', 'Video)]
it builds a new list without those 2 names, which is thus 2 elements shorter, and is copied (because of the slice reference on the LHS) into the whole list (because the slice reference is a whole reference) making that list to now be 2 elements shorter.

or i could do ...
for root, dirs, files in os.walk('/home/victor/'):
    dirs[:] = sorted([item for item in dirs if item not in ('Music', 'Video)])
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#10
i am still running into a problem. i can successfully sort the tree, but the descent is not what i want. when the descent reaches some directory in the middle (every directory has this issue but you won't see it on leaf directories), that is the only opportunity to yield the members of that directory. if it has a subdirectory, i want to yield, one file object at a time, the entire subtree of it before yielding anything after it at that level.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  examples using os.walk() Skaperen 12 7,161 Mar-22-2021, 05:56 PM
Last Post: Skaperen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020