Python Forum
Find specific subdir, open files and find specific lines that are missing from a file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Find specific subdir, open files and find specific lines that are missing from a file
#1
Hi,

I have a directory with bunch of subdirectories each subdir has a one file only,
I need to process files only form the subdirectories that have letter “H” in a name.
Each file will contain lines with the words "CELL-1", "CELL-2" up to "CELL-12 "- I’m interested in those lines .
I'd like to scan the file line by line and find/print "CELL-XX" lines for processing that are present in a file and the ones that are missing from a file.

Something like this:

output_file_a.write()
line CELL-1 -missing
line CELL-2 - infile
line CELL-3 -missing
and so on.....

output_file_b.write()
line CELL-1 -infile
line CELL-2 - infile
line CELL-3 -missing
and so on.....
I can find all the files and print out “CELL-xx” lines that are in each file Smile but not the one that are missing. Sad
Thank you.

import os
import pathlib

path = 'c:/path_tosubdirs/'
mytof = 'H' 

for file in os.listdir(path):
    hdir_f = os.path.join(path, file)
    
    if mytof in hdir_f :                  ### Directories with 'H" in name
        path2 = hdir_f
        for file1 in os.listdir(path2):
            hdir_f1 = os.path.join(path2, file1)
            print ("DIR\path\file ->>",hdir_f1)

            with open (hdir_f1) as cells_file : 
                for el in cells_file :
                    if 'CELL-' in el :
                        el=el.rstrip()
                        print("CELL-xx ", el)
Reply
#2
Out of curiosity, is it necessary to write this yourself? Does Windows not have a tool like grep?
Reply
#3
You could try something along the line of
import re

with open (hdir_f1) as cells_file:
    inum = set(int(match.group(1)) for match in
        (re.search(r"CELL\-(\d+)", line) for line in cells_file) if match)
    for i in range(1, 13):
        print('line CELL-{} - {}'.format(
            i, 'infile' if i in inum else 'missing'))
Reply
#4
To 'ndc85430' the solution i'm looking for will be a part of a "bigger" script. I'd like to keep it all in "Python".
Reply
#5
I beleive that the proper way to parse the file (and using the matching suggested) would be with readlines.
Only line by line can tell you if the expression does *not* exist on a given line.
Not familiar enough with re.search to know if defaults to line-by-line.




`
Reply
#6
(Aug-24-2020, 06:53 AM)millpond Wrote: I beleive that the proper way to parse the file (and using the matching suggested) would be with readlines.
Only line by line can tell you if the expression does *not* exist on a given line.
Not familiar enough with re.search to know if defaults to line-by-line.

Do oy think you can show me how to do this? I'd like to know how I could do this, I'm sure there are many other ways to accomplish the tusk I just do not see any of them...

THank you!
Reply
#7
To Gribouillis.
Thet snippet you shared - works!
Thank you for your help. One more request for you. Could you explain the code please?
I think I understand it but probably not.

Thank you!
Tester_V
Reply
#8
tester_V Wrote:Could you explain the code please?
Well the expresssion re.search(r"CELL\-(\d+)", line) returns either None or a MatchObject in the sense of the re module. There is a match object if a substring such as CELL-8 was found in the line. With the match object, one can get the number 8, that's the value returned by the expression int(match.group(1)). Thus the sequence
sequence = (re.search(r"CELL\-(\d+)", line) for line in cells_file)
is a sequence such as None, None, match, None, match,... with one item per line of the file.

The expression
inum = set(int(match.group(1)) for match in sequence if match)
computes the set of all integers found in the above matches, hence all the integers i such that CELL-i was found in the file (actually, only the first occurrence on each line is taken into account).
Then there is a loop, equivalent to
for i in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]:
    if i in inum:
        print(f"CELL-{i} infile")
    else:
        print(f"CELL-{i} missing")
Reply
#9
To Gribouillis:
Outstanding!
Thank you for the code and the coaching. I really appreciate it and probably many other...
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  delete specific row of entries jacksfrustration 3 326 Feb-13-2024, 11:13 PM
Last Post: deanhystad
  Extracting specific file from an archive tester_V 4 428 Jan-29-2024, 06:41 PM
Last Post: tester_V
  Open/save file on Android frohr 0 281 Jan-24-2024, 06:28 PM
Last Post: frohr
  Why can't I copy and past only ONE specific tab? NewWorldRonin 8 701 Jan-12-2024, 06:31 PM
Last Post: deanhystad
  data validation with specific regular expression shaheen07 0 296 Jan-12-2024, 07:56 AM
Last Post: shaheen07
  file open "file not found error" shanoger 8 946 Dec-14-2023, 08:03 AM
Last Post: shanoger
  Can't Find Python (or Something) pklind 2 487 Nov-26-2023, 11:11 AM
Last Post: snippsat
  Can't Find Path hatflyer 8 950 Oct-30-2023, 06:17 AM
Last Post: Gribouillis
  find the sum of a series of values that equal a number ancorte 1 461 Oct-30-2023, 05:41 AM
Last Post: Gribouillis
  find and group similar words with re? cartonics 4 683 Oct-27-2023, 05:36 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020