Python Forum
Find specific subdir, open files and find specific lines that are missing from a file - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Find specific subdir, open files and find specific lines that are missing from a file (/thread-29208.html)



Find specific subdir, open files and find specific lines that are missing from a file - tester_V - Aug-22-2020

Hi,

I have a directory with bunch of subdirectories each subdir has a one file only,
I need to process files only form the subdirectories that have letter “H” in a name.
Each file will contain lines with the words "CELL-1", "CELL-2" up to "CELL-12 "- I’m interested in those lines .
I'd like to scan the file line by line and find/print "CELL-XX" lines for processing that are present in a file and the ones that are missing from a file.

Something like this:

output_file_a.write()
line CELL-1 -missing
line CELL-2 - infile
line CELL-3 -missing
and so on.....

output_file_b.write()
line CELL-1 -infile
line CELL-2 - infile
line CELL-3 -missing
and so on.....
I can find all the files and print out “CELL-xx” lines that are in each file Smile but not the one that are missing. Sad
Thank you.

import os
import pathlib

path = 'c:/path_tosubdirs/'
mytof = 'H' 

for file in os.listdir(path):
    hdir_f = os.path.join(path, file)
    
    if mytof in hdir_f :                  ### Directories with 'H" in name
        path2 = hdir_f
        for file1 in os.listdir(path2):
            hdir_f1 = os.path.join(path2, file1)
            print ("DIR\path\file ->>",hdir_f1)

            with open (hdir_f1) as cells_file : 
                for el in cells_file :
                    if 'CELL-' in el :
                        el=el.rstrip()
                        print("CELL-xx ", el)



RE: Find specific subdir, open files and find specific lines that are missing from a file - ndc85430 - Aug-23-2020

Out of curiosity, is it necessary to write this yourself? Does Windows not have a tool like grep?


RE: Find specific subdir, open files and find specific lines that are missing from a file - Gribouillis - Aug-23-2020

You could try something along the line of
import re

with open (hdir_f1) as cells_file:
    inum = set(int(match.group(1)) for match in
        (re.search(r"CELL\-(\d+)", line) for line in cells_file) if match)
    for i in range(1, 13):
        print('line CELL-{} - {}'.format(
            i, 'infile' if i in inum else 'missing'))



RE: Find specific subdir, open files and find specific lines that are missing from a file - tester_V - Aug-23-2020

To 'ndc85430' the solution i'm looking for will be a part of a "bigger" script. I'd like to keep it all in "Python".


RE: Find specific subdir, open files and find specific lines that are missing from a file - millpond - Aug-24-2020

I beleive that the proper way to parse the file (and using the matching suggested) would be with readlines.
Only line by line can tell you if the expression does *not* exist on a given line.
Not familiar enough with re.search to know if defaults to line-by-line.




`


RE: Find specific subdir, open files and find specific lines that are missing from a file - tester_V - Aug-24-2020

(Aug-24-2020, 06:53 AM)millpond Wrote: I beleive that the proper way to parse the file (and using the matching suggested) would be with readlines.
Only line by line can tell you if the expression does *not* exist on a given line.
Not familiar enough with re.search to know if defaults to line-by-line.

Do oy think you can show me how to do this? I'd like to know how I could do this, I'm sure there are many other ways to accomplish the tusk I just do not see any of them...

THank you!


RE: Find specific subdir, open files and find specific lines that are missing from a file - tester_V - Aug-24-2020

To Gribouillis.
Thet snippet you shared - works!
Thank you for your help. One more request for you. Could you explain the code please?
I think I understand it but probably not.

Thank you!
Tester_V


RE: Find specific subdir, open files and find specific lines that are missing from a file - Gribouillis - Aug-24-2020

tester_V Wrote:Could you explain the code please?
Well the expresssion re.search(r"CELL\-(\d+)", line) returns either None or a MatchObject in the sense of the re module. There is a match object if a substring such as CELL-8 was found in the line. With the match object, one can get the number 8, that's the value returned by the expression int(match.group(1)). Thus the sequence
sequence = (re.search(r"CELL\-(\d+)", line) for line in cells_file)
is a sequence such as None, None, match, None, match,... with one item per line of the file.

The expression
inum = set(int(match.group(1)) for match in sequence if match)
computes the set of all integers found in the above matches, hence all the integers i such that CELL-i was found in the file (actually, only the first occurrence on each line is taken into account).
Then there is a loop, equivalent to
for i in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]:
    if i in inum:
        print(f"CELL-{i} infile")
    else:
        print(f"CELL-{i} missing")



RE: Find specific subdir, open files and find specific lines that are missing from a file - tester_V - Aug-25-2020

To Gribouillis:
Outstanding!
Thank you for the code and the coaching. I really appreciate it and probably many other...