![]() |
short version of os.listdir() - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: short version of os.listdir() (/thread-42109.html) Pages:
1
2
|
short version of os.listdir() - Skaperen - May-12-2024 is there a way to shorten os.listdir() such as to have it only read several (like maybe 32 to 256) names at a time? i need to scan through a massively huge directory and it is have trouble with it being so big. the directory has well over 70 million files. RE: short version of os.listdir() - deanhystad - May-12-2024 Don’t use os.listdir. Use pathlib.iterdir RE: short version of os.listdir() - Pedroski55 - May-12-2024 Fun with generators! from pathlib import Path import sys mydir = Path('/home/pedro') filelist = (filename for filename in mydir.rglob("*") if filename.is_file()) type(filelist) # generator sys.getsizeof(filelist) # returns 104 total = sum(1 for f in filelist) # takes a couple of seconds then returns 193820 # show some of the files filelist = (filename for filename in mydir.rglob("*") if filename.is_file()) for f in range(32): print(next(filelist))Apparently, in the latest Python, pathlib has .walk() just like os (I don't have the latest Python!) import pathlib path = pathlib.Path(r"E:\folder") for root, dirs, files in path.walk(): print("Root: ") print(root) print("Dirs: ") print(dirs) print("Files: ") print(files) print("")What do you want to do with 70 million files?? RE: short version of os.listdir() - Gribouillis - May-12-2024 In addition to pathlib.iterdir() , you can use more_itertools.chunked()
RE: short version of os.listdir() - snippsat - May-12-2024 Can also use itertools.islice to slice into a generator.So here load only files eg 5-10 or 32-256 into memory. from pathlib import Path from itertools import islice def generate_paths(directory): for path in Path(directory).rglob('*'): if path.is_file(): yield path if __name__ == '__main__': dest = r'C:\Test' # Slice into the generator to get files in range specified selected_files = islice(generate_paths(dest), 5, 11) for path in selected_files: print(path) RE: short version of os.listdir() - Skaperen - May-15-2024 (May-12-2024, 05:14 AM)Pedroski55 Wrote: What do you want to do with 70 million files??reduce it to about 700 files or maybe even fewer. RE: short version of os.listdir() - Skaperen - May-15-2024 (May-12-2024, 03:15 AM)deanhystad Wrote: Don’t use os.listdir. Use pathlib.iterdirit gives me only ONE (1) at a time. i guess that's what "iter" implies. this is going to take "forever". is there a way to get like 256 at a time, or at least do one input from the directory per block that the names are stored on? RE: short version of os.listdir() - Skaperen - May-15-2024 the desire to get 32 to 256 at a time is not so i can have a loop do one at a time. it's so i can get all the names from a directory block with a single physical read operation. i created a test directory and was able to put 243 files into a single block of a directory. re-phrased: my goal is to read the entire directory as fast as possible to acquire the list of names and write that list into a file. then i will run things to filter that huge list down to the few files i actually need, based only on the particular name fitting a collection of patterns. i don't need to open any of these files, yet. hmmm, how to open a directory as a file in Python? trivial in C. never done this in Python. maybe os.open() and os.read(). RE: short version of os.listdir() - Gribouillis - May-15-2024 (May-15-2024, 02:22 AM)Skaperen Wrote: how to open a directory as a file in Python? trivial in C.How do you do that in C? Isn't it a call to opendir() and a loop of calls to readdir() ?Hm, ChatGpt told me one can read a directory in C with scandir() . In your case however it would use malloc to allocate 70 millions character strings. I don't see how you could get only chunks of 256 entries for exampl.
RE: short version of os.listdir() - Skaperen - May-15-2024 (May-15-2024, 05:25 PM)Gribouillis Wrote:to open a directory as a file you simply do the steps you would do if it is a regular file,(May-15-2024, 02:22 AM)Skaperen Wrote: how to open a directory as a file in Python? trivial in C.How do you do that in C? Isn't it a call to open() and read() . doing "the file way" in Python, on a directory, raises IsADirectoryError . (May-15-2024, 05:25 PM)Gribouillis Wrote: Hm, ChatGpt told me one can read a directory in C with i would not use scandir() (i have never used it for any purpose). if i were doing this in C, i would use read() , if i opened it with open() . i think readdir() buffers whatever it gets when it does read() instead of the whole directory all at once, which could make it usable (instead of duplicating code to slice up a directory). i need to try more things with Python, first, before i drop back to C to do this. i have zero experience mixing C and Python.
|