Python Forum
glob for dir listing - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: glob for dir listing (/thread-2327.html)



glob for dir listing - bluefrog - Mar-07-2017

Hi 
I am attempting to list all directories using a glob pattern.
In each case below I cannot obtain a definitive list.

for example:
>>> import glob
>>> dirPattern = '/data/part[0-9]'
>>> for d in glob.glob(dirPattern):
...   print(d)
... 
/data/part8
/data/part2
/data/part4
/data/part7
/data/part3
/data/part1
/data/part5
/data/part9
/data/part6
one of the directories is however missing - "/data/part10"

I have tried this as well:
>>> dirPattern='/data/part[0-9]{2}'
>>> for d in glob.glob(dirPattern):
...   print(d)
... 
>>> 
>>> dirPattern='/data/part[0-9][0-9]'
>>> for d in glob.glob(dirPattern):
...   print(d)
... 
/data/part10
But as you can see, either nothing appears or only part10 is listed.
Can anybody suggest a pattern that will match part1 to part10 ?

Thanks


RE: glob for dir listing - camp0 - Mar-07-2017

import glob
dirPattern = './part*'
for d in glob.glob(dirPattern):
   print(d)
This works for me


RE: glob for dir listing - bluefrog - Mar-07-2017

that would also include other directories, for example "part_xyz".
Only directories with a numeric ending (2 digits only) should be listed.

Glob might not be the answer.


RE: glob for dir listing - snippsat - Mar-07-2017

(Mar-07-2017, 03:08 PM)bluefrog Wrote: Glob might not be the answer.
Yes glob can only do simple regex stuff.
Can use os.listdir() or newer Python version os.scandir().
Then can write own regex.
Eg:
import os
import re

for f_name in os.listdir():
   if re.match(r'^[A-Za-z]+\d{1,2}$', f_name):
       print(f_name)



RE: glob for dir listing - zivoni - Mar-07-2017

You can still use glob combined with re matching:

for d in glob.glob("/data/part[0-9]*"):
    if re.match("/data/part\d{1,2}$", d):
       print(d)
Compared to os.listdir(), it will iterate only on "prefiltered" list, practically it should be same.

And if you dont have mixed names like part2x, then even glob("/data/part[0-9]*") would work ...


RE: glob for dir listing - bluefrog - Mar-07-2017

great, thanks!
I should've used a regex from the start.

The only thing for us to consider however is that often we have to query hadoop filesystems, so although your suggestion will work for data on shared linux file systems, I don't think it will do for hadoop.

I'll have to experiment with prefix's