Posts: 212
Threads: 94
Joined: Aug 2018
Hello,
I need to find all the files located under a given directory, including all the sub-directories it may contain.
Is this a/the right way to do it?
import os
import glob
for filename in glob.iglob(r".\parent" '**/**', recursive=True):
#ignore sub-directories, just grab files
if not os.path.isdir(filename):
print(filename) Thank you.
Posts: 4,790
Threads: 76
Joined: Jan 2018
You could use
import os
root = '.'
files = [os.path.join(d, f) for d, _, files in os.walk(root) for f in files]
print(files)
« We can solve any problem by introducing an extra level of indirection »
Posts: 2,126
Threads: 11
Joined: May 2017
Use Path from pathlib module.
This is the modern way to handle paths.
Mostly all methods, which are functions in os and os.path for path handling, are attached to the Path object. Some are missing, but it's very useful to handle paths across different operating systems.
Path.home() returns a PosixPath of the home directory on Linux.
On Windows it returns a WindowsPath, which points to the right home directory.
I think macOS is similar to Linux.
from collections.abc import Generator
from pathlib import Path
def get_files(root: str | Path) -> Generator[Path, None, None]:
"""
Generator, which recursively yields all files of a directory
"""
for path in Path(root).rglob("*"):
if path.is_file():
yield path
all_files = list(get_files(Path.home().joinpath("Downloads/")))
Posts: 212
Threads: 94
Joined: Aug 2018
Thank you.
(Apr-22-2024, 07:10 PM)Gribouillis Wrote: files = [os.path.join(d, f) for d, _, files in os.walk(root) for f in files]
I'm not used to one-liners. Am I correct in understanding it's the equivalent of the following?
def somefunc:
for d, _, files in os.walk(root)
for f in files
return os.path.join(d, f) (Apr-22-2024, 07:17 PM)DeaD_EyE Wrote: from collections.abc import Generator
from pathlib import Path
def get_files(root: str | Path) -> Generator[Path, None, None]:
"""
Generator, which recursively yields all files of a directory
"""
for path in Path(root).rglob("*"):
if path.is_file():
yield path
all_files = list(get_files(Path.home().joinpath("Downloads/")))
I've never seen that syntax (def get_files(root: str | Path) -> Generator[Path, None, None]:) . I'll have to do some read up.
Posts: 1,094
Threads: 143
Joined: Jul 2017
I agree with Dead_EyE, Path is very useful, because it will work for different operating systems and it has other tricks up its sleeve!
if filename.is_file() means you won't pick up directories or their contents.
from pathlib import Path
import sys
# mydir looks like: PosixPath('/home/pedro/temp')
mydir = Path('/home/pedro/temp')
# create a generator: filelist <generator object <genexpr> at 0x7ad729970900>
filelist = (filename for filename in mydir.iterdir() if filename.is_file())
# create a list of files
file_list = [filename for filename in mydir.iterdir() if filename.is_file()]
# have a look at the files
for f in filelist:
print(f)
# use Path to look at the files
# need to recreate the generator
filelist = (filename for filename in mydir.iterdir() if filename.is_file())
for filename in filelist:
print(f"\nfilename: {filename.name}")
print(f"file suffix: {filename.suffix}")
print(f"full path: {filename.resolve()}")
print(f"filepath parts: {filename.parts}") The advantage of the generator is size. Imagine you had millions of files. The list would be very big in memory. The generator is tiny!
sys.getsizeof(filelist) # returns 104
sys.getsizeof(file_list) # returns 472 nearly 5 times bigger than filelist
Posts: 4,790
Threads: 76
Joined: Jan 2018
Apr-23-2024, 07:22 AM
(This post was last modified: Apr-23-2024, 07:22 AM by Gribouillis.)
(Apr-23-2024, 02:02 AM)Winfried Wrote: I'm not used to one-liners. Am I correct in understanding it's the equivalent of the following? It is not exactly equivalent because the one-liner here produces a list, so it would be
def somefunc(root):
result = []
for d, _, files in os.walk(root):
for f in files:
result.append(os.path.join(d, f))
return result You could also write a generator and also you can have it produce Path instances if you want
import os
from pathlib import Path
def somefunc(root):
for d, _, files in os.walk(root):
p = Path(d)
for f in files:
yield p/f The same generator as a one-liner
files = (p/f for d, _, files in os.walk('.') for p in (Path(d),) for f in files) Advice: use 4 spaces to indent Python code. Better: use the black utility to format your code automatically.
« We can solve any problem by introducing an extra level of indirection »
Posts: 212
Threads: 94
Joined: Aug 2018
Apr-23-2024, 07:52 AM
(This post was last modified: Apr-23-2024, 07:52 AM by Winfried.)
Thanks much. The list I'll work on is tiny enough that I don't need a generator, but it's nice to know that it's available. Ditto for one liners and function annotations.
Posts: 7,320
Threads: 123
Joined: Sep 2016
Apr-23-2024, 08:11 AM
(This post was last modified: Apr-23-2024, 08:11 AM by snippsat.)
(Apr-23-2024, 06:17 AM)Gribouillis Wrote: You could also write a generator and also you can have it produce Path instances if you want
import os
from pathlib import Path
def somefunc(root):
for d, _, files in os.walk(root):
p = Path(d)
for f in files:
yield p/f To mix ios and pathlib may be more confusing than it need to be,this dos just the same with pathlib alone.
from pathlib import Path
def somefunc(root):
for path in Path(dest).rglob('*'):
if path.is_file():
yield path
In it's simplest form Winfried a strip down version of what DeaD_EyE code dos.
So this will recursively scan for all .txt in folder Test ,all files would be rglob('*') .
from pathlib import Path
dest = r'C:\Test'
for path in Path(dest).rglob('*.txt'):
if path.is_file():
print(path) Quote:I've never seen that syntax (def get_files(root: str | Path) -> Generator[Path, None, None]:) . I'll have to do some read up.
Look at Support for type hints,so it can make code clearer what it take as input and expected output.
Can work as better documentation of code and also show this in Editors autocompletion when eg mouse over or use help what get_files dos.
It have no impact running it with Python,has to use eg Mypy to do a static type check.
So as a example when i say not needed,this will work fine.
from pathlib import Path
def get_files(root):
"""
Generator, which recursively yields all files of a directory
"""
for path in Path(root).rglob("*.txt"):
if path.is_file():
yield path
dest = r'C:\Test'
for path in get_files(dest):
print(path) So less code,but also lose information of what root can take as input eg it can take both str or Path as input.
Type hint is no used as standard in many biggest workplaces that use Python,also in many 3-party libraries.
Eg FastAPI
Doc Wrote:FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.8+ based on standard Python type hints .
Posts: 4,790
Threads: 76
Joined: Jan 2018
(Apr-23-2024, 08:11 AM)snippsat Wrote: To mix ios and pathlib may be more confusing than it need to be,this dos just the same with pathlib alone. What I don't like with the pathlib solution is the call to .is_file() for every path, which does a system call while os.walk() produces the list of files with internal mechanism. This needs to be checked but I suspect that os.walk() does less system calls.
« We can solve any problem by introducing an extra level of indirection »
Posts: 2,126
Threads: 11
Joined: May 2017
TypeHints are not required, but they help the developers of libraries to communicate what a method/function is expecting and what it should return.
Quote:What I don't like with the pathlib solution is the call to .is_file() for every path, which does a system call while os.walk() produces the list of files with internal mechanism. This needs to be checked but I suspect that os.walk() does less system calls.
It does lesser calls and is faster.
Output: public@ganymed:~$ for script in walk?.py; do echo -n "$script: "; strace -e newfstatat python3 $script 2>&1 | wc -l; done
walk1.py: 27556
walk2.py: 129893
walk3.py: 11484
Output: public@ganymed:~$ for script in walk?.py; do echo -n "$script: "; time python3 $script; echo ; done
walk1.py:
real 0m0,351s
user 0m0,192s
sys 0m0,160s
walk2.py:
real 0m1,208s
user 0m0,883s
sys 0m0,325s
walk3.py:
real 0m0,285s
user 0m0,179s
sys 0m0,106s
pathlib.Path.walk were added since Python 3.12: https://docs.python.org/3/library/pathli....Path.walk
This is similar to os.walk, but has some differences with the handling of symlinks.
The last example uses Output: Path.walk()
walk1.py
import os
for root, dirs, files in os.walk("/usr"):
for file in files:
... walk2.py
from pathlib import Path
for element in Path("/usr").rglob("*"):
element.is_file() walk3.py
from pathlib import Path
for root, dirs, files in Path("/usr").walk():
for file in files:
pass
Gribouillis likes this post
|