May-15-2024, 07:11 PM
(May-15-2024, 06:05 PM)Skaperen Wrote: i would not use scandir() (i have never used it for any purpose).Maybe not directly,but all
directory iteration
in Python inherit from this after PEP 471 – os.scandir() function – a better and faster directory iteratorStarting from Python 3.5, the implementation of
os.listdir
was modified to use os.scandir
.The same with eg
Path.iterdir(), Path.glob, Path.rglob
they all use the same C code underneath C speedups for scandir module.Just to make clear with me code example it dos not matter that a directory has eg 70 million files as you mention.
Can read it in chunk of whatever size,the point is only read files specified into memory an not all.
Here i read files from eg 12000, 12005 in folder that has 15k files,eg with
os.listdir
it will read all 15k files.from pathlib import Path from itertools import islice def generate_paths(directory): for path in Path(directory).rglob('*'): if path.is_file(): yield path if __name__ == '__main__': dest = r'G:\div_code' # Slice into the generator to get files in range specified selected_files = islice(generate_paths(dest), 12000, 12005) for path in selected_files: print(path)
Output:G:\div_code\answer\ibm_env\Lib\site-packages\pip\_vendor\rich\bar.py
G:\div_code\answer\ibm_env\Lib\site-packages\pip\_vendor\rich\box.py
G:\div_code\answer\ibm_env\Lib\site-packages\pip\_vendor\rich\cells.py
G:\div_code\answer\ibm_env\Lib\site-packages\pip\_vendor\rich\color.py
G:\div_code\answer\ibm_env\Lib\site-packages\pip\_vendor\rich\color_triplet.py