Python Forum
Loop through all files in a directory? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Loop through all files in a directory? (/thread-42010.html)

Pages: 1 2


Loop through all files in a directory? - Winfried - Apr-22-2024

Hello,

I need to find all the files located under a given directory, including all the sub-directories it may contain.

Is this a/the right way to do it?

import os
import glob

for filename in glob.iglob(r".\parent" '**/**', recursive=True):
  #ignore sub-directories, just grab files
  if not os.path.isdir(filename):
    print(filename)
Thank you.


RE: Loop through all files in a directory? - Gribouillis - Apr-22-2024

You could use
import os
root = '.'
files = [os.path.join(d, f) for d, _, files in os.walk(root) for f in files]
print(files)



RE: Loop through all files in a directory? - DeaD_EyE - Apr-22-2024

Use Path from pathlib module.
This is the modern way to handle paths.

Mostly all methods, which are functions in os and os.path for path handling, are attached to the Path object. Some are missing, but it's very useful to handle paths across different operating systems.

Path.home()
returns a PosixPath of the home directory on Linux.
On Windows it returns a WindowsPath, which points to the right home directory.
I think macOS is similar to Linux.

from collections.abc import Generator
from pathlib import Path


def get_files(root: str | Path) -> Generator[Path, None, None]:
    """
    Generator, which recursively yields all files of a directory
    """
    for path in Path(root).rglob("*"):
        if path.is_file():
            yield path


all_files = list(get_files(Path.home().joinpath("Downloads/")))



RE: Loop through all files in a directory? - Winfried - Apr-23-2024

Thank you.

(Apr-22-2024, 07:10 PM)Gribouillis Wrote:
files = [os.path.join(d, f) for d, _, files in os.walk(root) for f in files]

I'm not used to one-liners. Am I correct in understanding it's the equivalent of the following?

def somefunc:
  for d, _, files in os.walk(root)
    for f in files
      return os.path.join(d, f)
(Apr-22-2024, 07:17 PM)DeaD_EyE Wrote:
from collections.abc import Generator
from pathlib import Path

def get_files(root: str | Path) -> Generator[Path, None, None]:
    """
    Generator, which recursively yields all files of a directory
    """
    for path in Path(root).rglob("*"):
        if path.is_file():
            yield path

all_files = list(get_files(Path.home().joinpath("Downloads/")))

I've never seen that syntax (def get_files(root: str | Path) -> Generator[Path, None, None]:) . I'll have to do some read up.


RE: Loop through all files in a directory? - Pedroski55 - Apr-23-2024

I agree with Dead_EyE, Path is very useful, because it will work for different operating systems and it has other tricks up its sleeve!

if filename.is_file() means you won't pick up directories or their contents.

from pathlib import Path
import sys 
 
# mydir looks like: PosixPath('/home/pedro/temp')
mydir = Path('/home/pedro/temp')
# create a generator: filelist <generator object <genexpr> at 0x7ad729970900>
filelist = (filename for filename in mydir.iterdir() if filename.is_file())
# create a list of files
file_list = [filename for filename in mydir.iterdir() if filename.is_file()]

# have a look at the files
for f in filelist:
    print(f)

# use Path to look at the files
# need to recreate the generator
filelist = (filename for filename in mydir.iterdir() if filename.is_file())

for filename in filelist:
    print(f"\nfilename: {filename.name}")
    print(f"file suffix: {filename.suffix}")
    print(f"full path: {filename.resolve()}")
    print(f"filepath parts: {filename.parts}")
The advantage of the generator is size. Imagine you had millions of files. The list would be very big in memory. The generator is tiny!

sys.getsizeof(filelist) # returns 104
sys.getsizeof(file_list) # returns 472 nearly 5 times bigger than filelist



RE: Loop through all files in a directory? - Gribouillis - Apr-23-2024

(Apr-23-2024, 02:02 AM)Winfried Wrote: I'm not used to one-liners. Am I correct in understanding it's the equivalent of the following?
It is not exactly equivalent because the one-liner here produces a list, so it would be
def somefunc(root):
    result = []
    for d, _, files in os.walk(root):
        for f in files:
            result.append(os.path.join(d, f))
    return result
You could also write a generator and also you can have it produce Path instances if you want
import os
from pathlib import Path

def somefunc(root):
    for d, _, files in os.walk(root):
        p = Path(d)
        for f in files:
            yield p/f
The same generator as a one-liner
files = (p/f for d, _, files in os.walk('.') for p in (Path(d),) for f in files)
Advice: use 4 spaces to indent Python code. Better: use the black utility to format your code automatically.


RE: Loop through all files in a directory? - Winfried - Apr-23-2024

Thanks much. The list I'll work on is tiny enough that I don't need a generator, but it's nice to know that it's available. Ditto for one liners and function annotations.


RE: Loop through all files in a directory? - snippsat - Apr-23-2024

(Apr-23-2024, 06:17 AM)Gribouillis Wrote: You could also write a generator and also you can have it produce Path instances if you want
import os
from pathlib import Path
 
def somefunc(root):
    for d, _, files in os.walk(root):
        p = Path(d)
        for f in files:
            yield p/f
To mix ios and pathlib may be more confusing than it need to be,this dos just the same with pathlib alone.
from pathlib import Path

def somefunc(root):
    for path in Path(dest).rglob('*'):
        if path.is_file():
            yield path

In it's simplest form Winfried a strip down version of what DeaD_EyE code dos.
So this will recursively scan for all .txt in folder Test,all files would be rglob('*').
from pathlib import Path

dest = r'C:\Test'
for path in Path(dest).rglob('*.txt'):
    if path.is_file():
        print(path)
Quote:I've never seen that syntax (def get_files(root: str | Path) -> Generator[Path, None, None]:) . I'll have to do some read up.
Look at Support for type hints,so it can make code clearer what it take as input and expected output.
Can work as better documentation of code and also show this in Editors autocompletion when eg mouse over or use help what get_files dos.
It have no impact running it with Python,has to use eg Mypy to do a static type check.
So as a example when i say not needed,this will work fine.
from pathlib import Path

def get_files(root):
    """
    Generator, which recursively yields all files of a directory
    """
    for path in Path(root).rglob("*.txt"):
        if path.is_file():
            yield path

dest = r'C:\Test'
for path in get_files(dest):
    print(path)
So less code,but also lose information of what root can take as input eg it can take both str or Path as input.
Type hint is no used as standard in many biggest workplaces that use Python,also in many 3-party libraries.
Eg FastAPI
Doc Wrote:FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.8+ based on standard Python type hints.



RE: Loop through all files in a directory? - Gribouillis - Apr-23-2024

(Apr-23-2024, 08:11 AM)snippsat Wrote: To mix ios and pathlib may be more confusing than it need to be,this dos just the same with pathlib alone.
What I don't like with the pathlib solution is the call to .is_file() for every path, which does a system call while os.walk() produces the list of files with internal mechanism. This needs to be checked but I suspect that os.walk() does less system calls.


RE: Loop through all files in a directory? - DeaD_EyE - Apr-23-2024

TypeHints are not required, but they help the developers of libraries to communicate what a method/function is expecting and what it should return.

Quote:What I don't like with the pathlib solution is the call to .is_file() for every path, which does a system call while os.walk() produces the list of files with internal mechanism. This needs to be checked but I suspect that os.walk() does less system calls.

It does lesser calls and is faster.

Output:
public@ganymed:~$ for script in walk?.py; do echo -n "$script: "; strace -e newfstatat python3 $script 2>&1 | wc -l; done walk1.py: 27556 walk2.py: 129893 walk3.py: 11484
Output:
public@ganymed:~$ for script in walk?.py; do echo -n "$script: "; time python3 $script; echo ; done walk1.py: real 0m0,351s user 0m0,192s sys 0m0,160s walk2.py: real 0m1,208s user 0m0,883s sys 0m0,325s walk3.py: real 0m0,285s user 0m0,179s sys 0m0,106s
pathlib.Path.walk were added since Python 3.12: https://docs.python.org/3/library/pathlib.html#pathlib.Path.walk
This is similar to os.walk, but has some differences with the handling of symlinks.

The last example uses
Output:
Path.walk()
walk1.py
import os


for root, dirs, files in os.walk("/usr"):
    for file in files:
         ...
walk2.py
from pathlib import Path


for element in Path("/usr").rglob("*"):
    element.is_file()
walk3.py
from pathlib import Path


for root, dirs, files in Path("/usr").walk():
    for file in files:
        pass