Python Forum
Help to find a string and read the next lines
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help to find a string and read the next lines
#1
Hello, I'm having trouble recognizing a specific line and reading two lines below it. I would like some help because I've tried many things and couldn't get where I need to.

Well, I have the test.txt file that needs to be read and every time the script finds the string "DOSIMETRY_TOTAL_DOSE" it needs to identify the line and seek the dose value reading that is located two lines below.

Below is the content of my TEST.TXT file

Quote:TEST.TXT
CHUCK AMARAL, 2020

[DOSIMETRY_TOTAL_DOSE_B: 00]

8.9762

[DOSIMETRY_TOTAL_DOSE_E: 00]

9.7324

[DOSIMETRY_TOTAL_DOSE_B: 01]

20.5469

[DOSIMETRY_TOTAL_DOSE_E: 01]

13.2534

[DOSIMETRY_TOTAL_DOSE_B: 02]

2.2764

[DOSIMETRY_TOTAL_DOSE_E: 02]

7.3634

[DOSIMETRY_TOTAL_DOSE_B: 03]

5.8867

[DOSIMETRY_TOTAL_DOSE_E: 03]

6.2521


So the script should identify the first string on line 4 and read the dose value on line 6 and write it in a file (extractedlines.txt), then identify another string on line 8, read the value on line 10, and write it again at the end of the extractedlines.txt file. The script should continue till the end of the TEST.TXT file.

My python script is:

in_file = "test.txt"
out_file = "extractedlines.txt"
 
search_for = "DOSIMETRY_TOTAL_DOSE"
line_num = 0
lines_found = 0
with open(out_file, 'w') as out_f:
    with open(in_file, "r") as in_f:
        for line in in_f:
            line_num += 1
            if search_for in line:
                lines_found += 1
                print("String '{}' found on line {}...".format(search_for, line_num))
                print("Dose value: ")
                out_f.write(line)
                out_f.write('Dose value: {} \n')  #HERE I DO NOT KNOW HOW TO MAKE THE SCRIPT READ THE VALUE TWO LINES BELOW
 
        print("{} lines were found with the string '{}'...".format(lines_found, search_for))
When I run it, it reads the TXT, it also identifies the strings and saves it in the extractedlines.txt file but I can't make it skip two lines and read the correct dose value and the result (as seen in my in extractedlines.txt) looks like this:

Quote:[DOSIMETRY_TOTAL_DOSE_B: 00]
Dose value: {}
[DOSIMETRY_TOTAL_DOSE_E: 00]
Dose value: {}
[DOSIMETRY_TOTAL_DOSE_B: 01]
Dose value: {}
[DOSIMETRY_TOTAL_DOSE_E: 01]
Dose value: {}
[DOSIMETRY_TOTAL_DOSE_B: 02]
Dose value: {}
[DOSIMETRY_TOTAL_DOSE_E: 02]
Dose value: {}
[DOSIMETRY_TOTAL_DOSE_B: 03]
Dose value: {}
[DOSIMETRY_TOTAL_DOSE_E: 03]
Dose value: {}

As you can see here, I can't get the dose values on the TEST.TXT file for each one of the observations.

Can someone help me? I've been racking my brain with this for a few days and I'm getting nowhere.

Thank you very much in advance for any help.

Best

Chuck
Reply
#2
one way:
import os


# Make sure path is set to current directory
os.chdir(os.path.abspath(os.path.dirname(__file__)))


def read_file(filename):
    with open(filename) as fp:
        n = 0
        for line in fp:
            line = line.strip()
            if not len(line):
                continue
            if n == 0:
                n = 1
                continue
            if n == 1:
                nline = line.split(',')
                print(f"name: {nline[0]}, year: {nline[1]}")
            else:
                if (n % 2) == 0: # even
                    # print(f"n: {n}")
                    print(f"{line}: Dose: ", end = '')
                else:
                    print(f"{line}")
            n += 1


if __name__ == '__main__':
    read_file('TEST.TXT')
output:
Output:
name: CHUCK AMARAL, year: 2020 [DOSIMETRY_TOTAL_DOSE_B: 00]: Dose: 8.9762 [DOSIMETRY_TOTAL_DOSE_E: 00]: Dose: 9.7324 [DOSIMETRY_TOTAL_DOSE_B: 01]: Dose: 20.5469 [DOSIMETRY_TOTAL_DOSE_E: 01]: Dose: 13.2534 [DOSIMETRY_TOTAL_DOSE_B: 02]: Dose: 2.2764 [DOSIMETRY_TOTAL_DOSE_E: 02]: Dose: 7.3634 [DOSIMETRY_TOTAL_DOSE_B: 03]: Dose: 5.8867 [DOSIMETRY_TOTAL_DOSE_E: 03]: Dose: 6.2521
Reply
#3
Hello Larz60+... thx for your help and for the script... it worked here with the test file (test.txt) however, when I used the real file (download link for the real file --> https://drive.google.com/open?id=1VqpMrd...JhYMjbwAGY) , it returned an error, as you can see below...

Traceback (most recent call last):
File "C:\Users\Administrador\Desktop\MSL\extractor2.py", line 31, in <module>
read_file('teste2.txt')
File "C:\Users\Administrador\Desktop\MSL\extractor2.py", line 20, in read_file
print(f"name: {nline[0]}, year: {nline[1]}")
IndexError: list index out of range

If you check the real file you will see 44 'DOSIMETRY_TOTAL_DOSE' blocks are spread along the entire file with a lot of info among them that should be discarded for the output file.

I'm right now trying to change your code but I still in need of help.

Thank you... best regards, Chuck
Reply
#4
Here it may be easier to write an regex.
Also bye using compile and finditer make it efficient for larger files.
pattern = re.compile(r"\[DOSIMETRY_TOTAL.*\]\s+(\S+)")
for match in pattern.finditer(data):
    print(20 * '-')
    print(match.group(0)) 
Output:
[DOSIMETRY_TOTAL_DOSE_B: 00] 9.30988 -------------------- [DOSIMETRY_TOTAL_DOSE_E: 00] 8.45142 -------------------- [DOSIMETRY_TOTAL_DOSE_B: 01] 9.18214 -------------------- [DOSIMETRY_TOTAL_DOSE_E: 01] 8.41000 -------------------- [DOSIMETRY_TOTAL_DOSE_B: 02] 8.87531 -------------------- [DOSIMETRY_TOTAL_DOSE_E: 02] 8.35574 .....
data that i test with is just string of the whole file.
group(1) will be values only.
Output:
9.30988 -------------------- 8.45142 -------------------- 9.18214 -------------------- 8.41000 -------------------- 8.87531 -------------------- 8.35574 -------------------- 9.15688 -------------------- 8.40126 -------------------- 8.88971 -------------------- 8.48842 .....
Reply
#5
using regex as snippsat suggests should be considered.

The reason my script didn't work with entire file, is probably this:

the sample you provided had TEST.TEXT as first line. I expect that the full file doesn't have that.

I also only checked for the name line at start of file. You probably have repeated names throughout the real file.

Either of there two conditions requires changes to my code.
is this the case?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Formatting a date time string read from a csv file DosAtPython 5 1,160 Jun-19-2023, 02:12 PM
Last Post: DosAtPython
  read a text file, find all integers, append to list oldtrafford 12 3,369 Aug-11-2022, 08:23 AM
Last Post: Pedroski55
  Find and Replace numbers in String giddyhead 2 1,197 Jul-17-2022, 06:22 PM
Last Post: giddyhead
  Editing text between two string from different lines Paqqno 1 1,287 Apr-06-2022, 10:34 PM
Last Post: BashBedlam
  I want to simplify this python code into fewer lines, it's about string mandaxyz 5 2,045 Jan-15-2022, 01:28 PM
Last Post: mandaxyz
Question [SOLVED] Delete specific characters from string lines EnfantNicolas 4 2,143 Oct-21-2021, 11:28 AM
Last Post: EnfantNicolas
  [Solved] Trying to read specific lines from a file Laplace12 7 3,473 Jun-21-2021, 11:15 AM
Last Post: Laplace12
  Find string between two substrings, in a stream of data xbit 1 2,113 May-09-2021, 03:32 PM
Last Post: bowlofred
  reading lines from a string [Solved] ebolisa 14 6,275 Mar-28-2021, 08:16 PM
Last Post: perfringo
  Regular expression: cannot find 1st number in a string Pavel_47 2 2,364 Jan-15-2021, 04:39 PM
Last Post: bowlofred

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020