Python Forum

Full Version: I need to copy all the directories that do not match the pattern
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Greetings!
I need to copy all the directories that do not match the pattern:
8 alphanumeric characters all capital, underscore, and the 4 digits at the end.
Something like this:
ED1234ND_2345
YD1COP1Z_3456
And so on...

I’m having a hell of a time creating regex for it. Sad

Here is what I got:
import re
from pathlib import Path
tofind = '^[A-Z0-9]{8}_d{4}$' 
for ed in Path('C:/01/TLogs/').iterdir() :
    if ed.is_dir():
        rudir = Path(ed).parts[3]
        #print(f" Dir_Name -{type(rudir)}")
        if re.findall(tofind,rudir) :
            print(f" found -> {rudir}")
I think the problem is in the 'underscore' part of the regex.
Any help is appreciated.
Thank you!
When match digit in regex need be like this \d and not d alone.
Can test regex at regex101.
You can delete line 6 and change line 8 to this:
if re.search(tofind, ed.stem):
my bad! It is a typo...
it is actually look like that
tofind = '^[A-Z0-9]{8}_\d{4}$' 
but it does not print anything...
If I'll remove part of it
_\d{4}$

it runs and prints but does not filter everything I'm looking for
It's not hurting you here, but you should get in the habit of using "r-strings" for regex patterns so backslashes don't get modified.

We don't have your filesystem to know what's there. But your pattern seems okay. Your problem may be in the filesystem or how you're trying to prune the names. As mentioned above, use .stem to pull the final component of a Path.

Your pattern is anchored, so it can't match multiple times. Use re.match or re.search instead of re.findall.

import re
tofind = r'^[A-Z0-9]{8}_\d{4}$'

for d in ["ED1234ND_2345", "YD1COP1Z_3456", "mydir"]:
    if re.findall(tofind,d):
        print(f"{d} Matched")
    else:
        print(f"{d}  no match")
Output:
ED1234ND_2345 Matched YD1COP1Z_3456 Matched mydir no match
Thank you for the code!
Do you think you could elaborate on why your snipped is working and my is not?
even if I add your regex it is not printing anything...
it seems exactly the same.

your code:
tofind = r'^[A-Z0-9]{8}_\d{4}$'
 
for d in ["_Y151029E_7345", "D151009EN_7295", "mydir","small_11","TST___3456","TST_3456","TST3456"]:
    if re.findall(tofind,d):
        print(f"{d} Matched")
    else:
        print(f"{d}  no match")
And I wrote (the directories names are the same):
import re
from pathlib import Path
tofind = r'^[A-Z0-9]{8}_\d{4}$'
for ed in Path('C:\\01\\TLogs').iterdir() :
    if ed.is_dir():
        rudir = Path(ed).parts[3]
        print(f" Dir_Name -{rudir}")
        if re.findall(tofind,rudir) :
            print(f" found -> {rudir}")
If i do a test.
import re
from pathlib import Path

my_dir = r'G:\div_code\foo_folder'
pattern = re.compile(r'^[A-Z0-9]{8}_\d{4}$')
for ed in Path(my_dir).iterdir():
    if ed.is_dir():
        #print(ed)
        # <stem> path component,without it's suffix
        if re.search(pattern, ed.stem):
            print(ed)
            print(ed.stem)
            print('-' * 30)
Output:
G:\div_code\foo_folder\11111111_1111 11111111_1111 ------------------------------ G:\div_code\foo_folder\ED1234ND_2345 ED1234ND_2345 ------------------------------ G:\div_code\foo_folder\YD1COP1Z_3456 YD1COP1Z_3456
So it's working as i expected,this is my content of foo_folder.
Output:
G:\div_code\foo_folder λ ls 11111111_1111/ AFILE111_1111.txt YD1COP1Z_3456/ find_dir.py '11111111_1111 not'/ ED1234ND_2345/ boy_2.txt test_folder/
If i want the opposite also all folder that don't match this pattern.
if not re.search(pattern, ed.stem):
Output:
G:\div_code\foo_folder\11111111_1111 not 11111111_1111 not ------------------------------ G:\div_code\foo_folder\test_folder test_folder
(Feb-04-2022, 08:23 AM)tester_V Wrote: [ -> ]even if I add your regex it is not printing anything...

If it's not printing anything then line 7 isn't being reached and your problem isn't related to the regex. As I mentioned, you might not be pulling the path data properly.

Try putting in a statement between 4 and 5 like print(ed). Are you getting the paths you expect? Is the 4th component the part you want?
Thank you!
I really appreciate your help!
This is the best forum for Python and not just because of the level of knowledge.
Very friendly attitude, not condescending... you guys are great!
Thank you for the snippet and the coaching again!