Python Forum
Anyone play with os.scandir? It is awsome!
Thread Rating:
  • 2 Vote(s) - 2 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Anyone play with os.scandir? It is awsome!
#1
I'm just wondering if anyone else has played with the following:

I have been fed up with windows 7 explorer search for a long time,
but with grep available for windows, I haven't ever tried to write a
search engine for it myself.

Recently, I discovered os.scandir written by Ben Hoyt

see: https://www.python.org/dev/peps/pep-0471/
and: http://benhoyt.com/writings/scandir/

i have not done much with it yet, but interactively
ran it and iterated over the results, simply writing each item out to disk,
and then editing the file with Notebook++

I passed as a path the root directory of a 5TB drive, which had .5 TB populated
I didn't think anything had happened when I ran it the first time, because it only ran for a few seconds.
Then when I opened the results file, I was amazed to find 299,681 entries containing all files and their full path.

There's more, I haven't found out how to access it yet, but also gets the os.stat and 'other' information as well.
the claim is that it's 20 times faster than os.walk.
Reply
#2
I did try it before it become a part of standard library(3.5-->).
Larz60+ Wrote:There's more, I haven't found out how to access it yet, but also gets the os.stat and 'other' information as well.
the claim is that it's 20 times faster than os.walk.
It's build in now,so using os.walk() is the same as using os.scandir().
Was called scandir.walk() before it was implemented into os.walk().

Contributing os.scandir() to Python
Ben Hoyt Wrote: It’s now used inside the popular os.walk() function to speed up walking directory trees by a significant amount.
Reply
#3
It seemed to be quite a bit faster than os.walk but perhaps that's because I wanted it to be faster,
rather than reality.
Reply
#4
os.walk will use os.scandir instead the way slower os.list. If it already has not happened.
PEP 471

There are many people still using Python 2.7 and arguing that it is better than v.3+. I am using Arch Linux so Python is always the latest stable version. Big Grin

It looks like it returns a generator so perhaps it could be used that way: async for entry in os.scandir('.')
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#5
Here's my test routine so far. It uses the example that I found on stack overflow
with minor modifications for the scantree method. The get_drives method uses os.stat to
trigger an exception to eliminate drives that show up, but are inactive. There's probably a
better way to do this. The output filename is also hard coded for testing.

It's fast, I will eventually time it and compare to os.walk. But since ow.walk now uses scandir,
I expect similar results.
I use python 3.6.2 and this is written for windows 7

The code:
from ctypes import windll
import string
import os
import sys

class TryScandir:
    def __init__(self):
        self.drives = []

    def get_drives(self):
        bitmask = windll.kernel32.GetLogicalDrives()
        for letter in string.ascii_uppercase:
            if bitmask & 1:
                self.drives.append(letter)
            bitmask >>= 1

        del_list = []
        for n, drive in enumerate(self.drives):
            try:
                sdrive = f'{drive}:/'
                os.stat(sdrive)
                # print(f'sdrive: {sdrive}, {os.stat(sdrive)}')
            except:
                del_list.append(n)
        if len(del_list):
            for i in sorted(del_list, reverse=True):
                del self.drives[i]
                # print(f'del_list: {del_list}')

    def scantree(self, path):
        """Recursively yield DirEntry objects for given directory."""
        for entry in os.scandir(path):
            if entry.is_dir(follow_symlinks=False):
                yield from self.scantree(entry.path)
            else:
                yield entry


    def startscan(self):
        ts.get_drives()
        entry = None
        print(f'The following drives will be scanned: {ts.drives}')
        with open('data/drivetree.txt', 'w', encoding='utf-8') as f:
            for drive in self.drives:
                sdrive = f'{drive}:/'
                print(f'Scanning {sdrive}')
                try:
                    for entry in self.scantree(sdrive):
                        f.write(f'{entry.path}\n')
                except PermissionError:
                    print(f'No Permission for {entry}')
                    pass
                except:
                    print(f'Unexpected error: {sys.exc_info()[0]}')


if __name__ == '__main__':
    ts = TryScandir()
    ts.startscan()
Reply
#6
In the documentation is said that in Windows entry.is_* can be an attribute which can lead to a better performance while in POSIX systems like Linux it is a system call. It was left as a method for both systems.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020