Python Forum
short version of os.listdir()
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
short version of os.listdir()
#11
(May-15-2024, 06:05 PM)Skaperen Wrote: i would not use scandir() (i have never used it for any purpose).
Maybe not directly,but all directory iteration in Python inherit from this after PEP 471 – os.scandir() function – a better and faster directory iterator
Starting from Python 3.5, the implementation of os.listdir was modified to use os.scandir.
The same with eg Path.iterdir(), Path.glob, Path.rglob they all use the same C code underneath C speedups for scandir module.

Just to make clear with me code example it dos not matter that a directory has eg 70 million files as you mention.
Can read it in chunk of whatever size,the point is only read files specified into memory an not all.
Here i read files from eg 12000, 12005 in folder that has 15k files,eg with os.listdir it will read all 15k files.
from pathlib import Path
from itertools import islice

def generate_paths(directory):
    for path in Path(directory).rglob('*'):
        if path.is_file():
            yield path

if __name__ == '__main__':
    dest = r'G:\div_code'
    # Slice into the generator to get files in range specified
    selected_files = islice(generate_paths(dest), 12000, 12005)
    for path in selected_files:
        print(path)
Output:
G:\div_code\answer\ibm_env\Lib\site-packages\pip\_vendor\rich\bar.py G:\div_code\answer\ibm_env\Lib\site-packages\pip\_vendor\rich\box.py G:\div_code\answer\ibm_env\Lib\site-packages\pip\_vendor\rich\cells.py G:\div_code\answer\ibm_env\Lib\site-packages\pip\_vendor\rich\color.py G:\div_code\answer\ibm_env\Lib\site-packages\pip\_vendor\rich\color_triplet.py
Reply
#12
(May-15-2024, 07:11 PM)snippsat Wrote: Can read it in chunk of whatever size,the point is only read files specified into memory an not all.
Here i read files from eg 12000, 12005 in folder that has 15k files,eg with os.listdir it will read all 15k files.
I don't agree. In your code example to access the slice 12000 to 12005, the code will call is_file() for at least 12005 paths, which means 12005 system calls to stat.

Besides, Python's scandir is not the same function as C's scandir. The code you linked uses the opendir-readdir-closedir pattern. See Accessing Directories in the GNU C library for example.

I wonder if opening a directory as a file with open() and use read() in C as @Skaperen does is documented somewhere. I doubt it is a safe/portable way to read directories. Also if this work, it may be possible to call the C library functions directly through the ctypes module.

I don't see the point in having directories with 7e7 files. You can store that many files in 4 levels of directories having 100 children each.
« We can solve any problem by introducing an extra level of indirection »
Reply
#13
(May-16-2024, 06:41 AM)Gribouillis Wrote: I don't see the point in having directories with 7e7 files.
no real point at all. it was an accident (missed typing a '/' where i should have) i worked to correct (got it done, now). the big issue is that when things read the next chunk of file names, something went back to the start of the directory each time, so each chunk was slower and iterating over chunks just got slower and slower while the disk stayed busy. i finally managed to make a full list of files (in C) and broke those into chunks to be copied to a new drive (in bash). took about 5 days, but i could watch the progress. now i have a few thousand subdirectories, with a few hundred files, each, about 6TB worth (and another 8 TB hard drive destined to become another backup space).
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  os.listdir() and follow_symlinks Skaperen 6 477 May-24-2024, 03:07 AM
Last Post: Skaperen
  Short code for EventGhost not working Patricia 8 3,867 Feb-09-2021, 07:49 PM
Last Post: Patricia
  How can I make a short-key in Spyder (Python IDE)? moose 3 2,796 Nov-02-2020, 12:13 PM
Last Post: jefsummers
  listdir on IP Adress OEMS1 3 3,001 Jul-19-2020, 06:01 PM
Last Post: bowlofred
  Short font question Pizzas391 9 3,453 Nov-27-2019, 05:57 PM
Last Post: ichabod801
  trouble with os.listdir on a network drive lconner 10 19,429 Jun-04-2019, 07:16 PM
Last Post: DeaD_EyE
  os.listdir(path) and non-string as input metalray 4 16,996 Aug-15-2018, 11:43 AM
Last Post: metalray
  listdir trouble Dixon 1 2,716 Jan-17-2018, 11:32 PM
Last Post: micseydel
  Can I upload a new version without previously deleting ancient version sylas 6 4,417 Nov-08-2017, 03:26 PM
Last Post: Larz60+
  float.hex() is one bit short Skaperen 4 4,201 Jul-26-2017, 03:53 AM
Last Post: Skaperen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020