Loop through all files in a directory? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Loop through all files in a directory? (/thread-42010.html) Pages:
1
2
|
Loop through all files in a directory? - Winfried - Apr-22-2024 Hello, I need to find all the files located under a given directory, including all the sub-directories it may contain. Is this a/the right way to do it? import os import glob for filename in glob.iglob(r".\parent" '**/**', recursive=True): #ignore sub-directories, just grab files if not os.path.isdir(filename): print(filename)Thank you. RE: Loop through all files in a directory? - Gribouillis - Apr-22-2024 You could use import os root = '.' files = [os.path.join(d, f) for d, _, files in os.walk(root) for f in files] print(files) RE: Loop through all files in a directory? - DeaD_EyE - Apr-22-2024 Use Path from pathlib module.This is the modern way to handle paths. Mostly all methods, which are functions in os and os.path for path handling, are attached to the Path object. Some are missing, but it's very useful to handle paths across different operating systems.Path.home()returns a PosixPath of the home directory on Linux. On Windows it returns a WindowsPath, which points to the right home directory. I think macOS is similar to Linux. from collections.abc import Generator from pathlib import Path def get_files(root: str | Path) -> Generator[Path, None, None]: """ Generator, which recursively yields all files of a directory """ for path in Path(root).rglob("*"): if path.is_file(): yield path all_files = list(get_files(Path.home().joinpath("Downloads/"))) RE: Loop through all files in a directory? - Winfried - Apr-23-2024 Thank you. (Apr-22-2024, 07:10 PM)Gribouillis Wrote:files = [os.path.join(d, f) for d, _, files in os.walk(root) for f in files] I'm not used to one-liners. Am I correct in understanding it's the equivalent of the following? def somefunc: for d, _, files in os.walk(root) for f in files return os.path.join(d, f) (Apr-22-2024, 07:17 PM)DeaD_EyE Wrote:from collections.abc import Generator from pathlib import Path def get_files(root: str | Path) -> Generator[Path, None, None]: """ Generator, which recursively yields all files of a directory """ for path in Path(root).rglob("*"): if path.is_file(): yield path all_files = list(get_files(Path.home().joinpath("Downloads/"))) I've never seen that syntax (def get_files(root: str | Path) -> Generator[Path, None, None]:) . I'll have to do some read up. RE: Loop through all files in a directory? - Pedroski55 - Apr-23-2024 I agree with Dead_EyE, Path is very useful, because it will work for different operating systems and it has other tricks up its sleeve! if filename.is_file() means you won't pick up directories or their contents.from pathlib import Path import sys # mydir looks like: PosixPath('/home/pedro/temp') mydir = Path('/home/pedro/temp') # create a generator: filelist <generator object <genexpr> at 0x7ad729970900> filelist = (filename for filename in mydir.iterdir() if filename.is_file()) # create a list of files file_list = [filename for filename in mydir.iterdir() if filename.is_file()] # have a look at the files for f in filelist: print(f) # use Path to look at the files # need to recreate the generator filelist = (filename for filename in mydir.iterdir() if filename.is_file()) for filename in filelist: print(f"\nfilename: {filename.name}") print(f"file suffix: {filename.suffix}") print(f"full path: {filename.resolve()}") print(f"filepath parts: {filename.parts}")The advantage of the generator is size. Imagine you had millions of files. The list would be very big in memory. The generator is tiny! sys.getsizeof(filelist) # returns 104 sys.getsizeof(file_list) # returns 472 nearly 5 times bigger than filelist RE: Loop through all files in a directory? - Gribouillis - Apr-23-2024 (Apr-23-2024, 02:02 AM)Winfried Wrote: I'm not used to one-liners. Am I correct in understanding it's the equivalent of the following?It is not exactly equivalent because the one-liner here produces a list, so it would be def somefunc(root): result = [] for d, _, files in os.walk(root): for f in files: result.append(os.path.join(d, f)) return resultYou could also write a generator and also you can have it produce Path instances if you want import os from pathlib import Path def somefunc(root): for d, _, files in os.walk(root): p = Path(d) for f in files: yield p/fThe same generator as a one-liner files = (p/f for d, _, files in os.walk('.') for p in (Path(d),) for f in files)Advice: use 4 spaces to indent Python code. Better: use the black utility to format your code automatically. RE: Loop through all files in a directory? - Winfried - Apr-23-2024 Thanks much. The list I'll work on is tiny enough that I don't need a generator, but it's nice to know that it's available. Ditto for one liners and function annotations. RE: Loop through all files in a directory? - snippsat - Apr-23-2024 (Apr-23-2024, 06:17 AM)Gribouillis Wrote: You could also write a generator and also you can have it produce Path instances if you wantTo mix ios and pathlib may be more confusing than it need to be,this dos just the same with pathlib alone. from pathlib import Path def somefunc(root): for path in Path(dest).rglob('*'): if path.is_file(): yield path In it's simplest form Winfried a strip down version of what DeaD_EyE code dos. So this will recursively scan for all .txt in folder Test ,all files would be rglob('*') .from pathlib import Path dest = r'C:\Test' for path in Path(dest).rglob('*.txt'): if path.is_file(): print(path) Quote:I've never seen that syntax (def get_files(root: str | Path) -> Generator[Path, None, None]:) . I'll have to do some read up.Look at Support for type hints,so it can make code clearer what it take as input and expected output. Can work as better documentation of code and also show this in Editors autocompletion when eg mouse over or use help what get_files dos.It have no impact running it with Python,has to use eg Mypy to do a static type check. So as a example when i say not needed,this will work fine. from pathlib import Path def get_files(root): """ Generator, which recursively yields all files of a directory """ for path in Path(root).rglob("*.txt"): if path.is_file(): yield path dest = r'C:\Test' for path in get_files(dest): print(path)So less code,but also lose information of what root can take as input eg it can take both str or Path as input.Type hint is no used as standard in many biggest workplaces that use Python,also in many 3-party libraries. Eg FastAPI Doc Wrote:FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.8+
RE: Loop through all files in a directory? - Gribouillis - Apr-23-2024 (Apr-23-2024, 08:11 AM)snippsat Wrote: To mix ios and pathlib may be more confusing than it need to be,this dos just the same with pathlib alone.What I don't like with the pathlib solution is the call to .is_file() for every path, which does a system call while os.walk() produces the list of files with internal mechanism. This needs to be checked but I suspect that os.walk() does less system calls.
RE: Loop through all files in a directory? - DeaD_EyE - Apr-23-2024 TypeHints are not required, but they help the developers of libraries to communicate what a method/function is expecting and what it should return. Quote:What I don't like with the pathlib solution is the call to .is_file() for every path, which does a system call while os.walk() produces the list of files with internal mechanism. This needs to be checked but I suspect that os.walk() does less system calls. It does lesser calls and is faster.
pathlib.Path.walk were added since Python 3.12: https://docs.python.org/3/library/pathlib.html#pathlib.Path.walkThis is similar to os.walk, but has some differences with the handling of symlinks. The last example uses walk1.pyimport os for root, dirs, files in os.walk("/usr"): for file in files: ...walk2.py from pathlib import Path for element in Path("/usr").rglob("*"): element.is_file()walk3.py from pathlib import Path for root, dirs, files in Path("/usr").walk(): for file in files: pass |