Python Forum
short version of os.listdir() - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: short version of os.listdir() (/thread-42109.html)

Pages: 1 2


short version of os.listdir() - Skaperen - May-12-2024

is there a way to shorten os.listdir() such as to have it only read several (like maybe 32 to 256) names at a time? i need to scan through a massively huge directory and it is have trouble with it being so big. the directory has well over 70 million files.


RE: short version of os.listdir() - deanhystad - May-12-2024

Don’t use os.listdir. Use pathlib.iterdir


RE: short version of os.listdir() - Pedroski55 - May-12-2024

Fun with generators!

from pathlib import Path
import sys

mydir = Path('/home/pedro')
filelist = (filename for filename in mydir.rglob("*") if filename.is_file())
type(filelist) # generator
sys.getsizeof(filelist) # returns 104
total = sum(1 for f in filelist) # takes a couple of seconds then returns 193820

# show some of the files
filelist = (filename for filename in mydir.rglob("*") if filename.is_file())
for f in range(32):
    print(next(filelist))
Apparently, in the latest Python, pathlib has .walk() just like os (I don't have the latest Python!)

import pathlib    
path = pathlib.Path(r"E:\folder")
for root, dirs, files in path.walk():
    print("Root: ")
    print(root)
    print("Dirs: ")
    print(dirs)
    print("Files: ")
    print(files)
    print("")
What do you want to do with 70 million files??


RE: short version of os.listdir() - Gribouillis - May-12-2024

In addition to pathlib.iterdir(), you can use more_itertools.chunked()


RE: short version of os.listdir() - snippsat - May-12-2024

Can also use itertools.islice to slice into a generator.
So here load only files eg 5-10 or 32-256 into memory.
from pathlib import Path
from itertools import islice

def generate_paths(directory):
    for path in Path(directory).rglob('*'):
        if path.is_file():
            yield path

if __name__ == '__main__':
    dest = r'C:\Test'
    # Slice into the generator to get files in range specified
    selected_files = islice(generate_paths(dest), 5, 11)
    for path in selected_files:
        print(path)



RE: short version of os.listdir() - Skaperen - May-15-2024

(May-12-2024, 05:14 AM)Pedroski55 Wrote: What do you want to do with 70 million files??
reduce it to about 700 files or maybe even fewer.


RE: short version of os.listdir() - Skaperen - May-15-2024

(May-12-2024, 03:15 AM)deanhystad Wrote: Don’t use os.listdir. Use pathlib.iterdir
it gives me only ONE (1) at a time. i guess that's what "iter" implies. this is going to take "forever". is there a way to get like 256 at a time, or at least do one input from the directory per block that the names are stored on?


RE: short version of os.listdir() - Skaperen - May-15-2024

the desire to get 32 to 256 at a time is not so i can have a loop do one at a time. it's so i can get all the names from a directory block with a single physical read operation. i created a test directory and was able to put 243 files into a single block of a directory.

re-phrased: my goal is to read the entire directory as fast as possible to acquire the list of names and write that list into a file. then i will run things to filter that huge list down to the few files i actually need, based only on the particular name fitting a collection of patterns. i don't need to open any of these files, yet.

hmmm, how to open a directory as a file in Python? trivial in C. never done this in Python. maybe os.open() and os.read().


RE: short version of os.listdir() - Gribouillis - May-15-2024

(May-15-2024, 02:22 AM)Skaperen Wrote: how to open a directory as a file in Python? trivial in C.
How do you do that in C? Isn't it a call to opendir() and a loop of calls to readdir() ?

Hm, ChatGpt told me one can read a directory in C with scandir(). In your case however it would use malloc to allocate 70 millions character strings. I don't see how you could get only chunks of 256 entries for exampl.


RE: short version of os.listdir() - Skaperen - May-15-2024

(May-15-2024, 05:25 PM)Gribouillis Wrote:
(May-15-2024, 02:22 AM)Skaperen Wrote: how to open a directory as a file in Python? trivial in C.
How do you do that in C? Isn't it a call to opendir() and a loop of calls to readdir() ?
to open a directory as a file you simply do the steps you would do if it is a regular file, open() and read(). doing "the file way" in Python, on a directory, raises IsADirectoryError.
(May-15-2024, 05:25 PM)Gribouillis Wrote: Hm, ChatGpt told me one can read a directory in C with scandir(). In your case however it would use malloc to allocate 70 millions character strings. I don't see how you could get only chunks of 256 entries for exampl.
i would not use scandir() (i have never used it for any purpose). if i were doing this in C, i would use read(), if i opened it with open(). i think readdir() buffers whatever it gets when it does read() instead of the whole directory all at once, which could make it usable (instead of duplicating code to slice up a directory). i need to try more things with Python, first, before i drop back to C to do this. i have zero experience mixing C and Python.