Python Forum

Full Version: glob for dir listing
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi 
I am attempting to list all directories using a glob pattern.
In each case below I cannot obtain a definitive list.

for example:
>>> import glob
>>> dirPattern = '/data/part[0-9]'
>>> for d in glob.glob(dirPattern):
...   print(d)
... 
/data/part8
/data/part2
/data/part4
/data/part7
/data/part3
/data/part1
/data/part5
/data/part9
/data/part6
one of the directories is however missing - "/data/part10"

I have tried this as well:
>>> dirPattern='/data/part[0-9]{2}'
>>> for d in glob.glob(dirPattern):
...   print(d)
... 
>>> 
>>> dirPattern='/data/part[0-9][0-9]'
>>> for d in glob.glob(dirPattern):
...   print(d)
... 
/data/part10
But as you can see, either nothing appears or only part10 is listed.
Can anybody suggest a pattern that will match part1 to part10 ?

Thanks
import glob
dirPattern = './part*'
for d in glob.glob(dirPattern):
   print(d)
This works for me
that would also include other directories, for example "part_xyz".
Only directories with a numeric ending (2 digits only) should be listed.

Glob might not be the answer.
(Mar-07-2017, 03:08 PM)bluefrog Wrote: [ -> ]Glob might not be the answer.
Yes glob can only do simple regex stuff.
Can use os.listdir() or newer Python version os.scandir().
Then can write own regex.
Eg:
import os
import re

for f_name in os.listdir():
   if re.match(r'^[A-Za-z]+\d{1,2}$', f_name):
       print(f_name)
You can still use glob combined with re matching:

for d in glob.glob("/data/part[0-9]*"):
    if re.match("/data/part\d{1,2}$", d):
       print(d)
Compared to os.listdir(), it will iterate only on "prefiltered" list, practically it should be same.

And if you dont have mixed names like part2x, then even glob("/data/part[0-9]*") would work ...
great, thanks!
I should've used a regex from the start.

The only thing for us to consider however is that often we have to query hadoop filesystems, so although your suggestion will work for data on shared linux file systems, I don't think it will do for hadoop.

I'll have to experiment with prefix's