Python Forum

Full Version: txt-file: read and append missing data
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am trying to insert missing data into a large text file.

The file contains information on different types of objects (curves, points and text). Each object is marked with a header, either ".CURVE XXXXX:", ".POINT XXXXX:" or ".TEXT XXXXX:", where the XXXXX is the ID number for each object.

I want to search the lines that describes each curve (which means that the header-line will start with .CURVE) and see if they have lines that starts with "...A_Z" and "...B_Z". If they don't, I want to append "A_Z 0.00" and "B_Z 0.00", change the line "..NE" to "..NEZ" and then append 0.00 to the end of all the lines between "..NEZ" and the new header line (which starts with either ".CURVE", ".POINT" or ".TEXT".)

If this is a part of the text that describes a curve:
Output:
.CURVE XXXXX: ..OBJTYPE Pipe ..QUAL * * * * * ...NR XXXXX ...BB A ...THEME AA ...THEMGRP X ...LENGTH XX.XX ...TR X ...KEY A ..NE XXXXXXXXX XXXXXXXX XXXXXXXXX XXXXXXXX XXXXXXXXX XXXXXXXX .CURVE XXXXX: #This line marks the start of a new curve/object
I want it to look like this after i have ran the code:
Output:
.CURVE XXXXX: ..OBJTYPE Pipe ..QUAL * * * * * ...NR XXXXX ...BB A ...THEME AA ...THEMGRP X ...LENGTH XX.XX ...TR X ...KEY A [b]...A_Z 0.00 ...B_Z 0.00[/b] ..NE[b]Z[/b] XXXXXXXXX XXXXXXXX [b]0.00[/b] XXXXXXXXX XXXXXXXX [b]0.00[/b] XXXXXXXXX XXXXXXXX [b]0.00[/b] .CURVE XXXXX: #This line marks the start of a new curve/object
I tried to use the code below, until i found out that the lines describing the objects were not written in a specific order. This means that the line that starts with "...KEY" is not useful as a marker. I think what i have to do is to split the text file into seperate parts for every header, read the parts seperately, append missing data (if missing), and then put all the parts back together. But i am not really sure how to do that. Ideas are most welcome!

input = open('C:/Users/sufi/Desktop/info.txt','r')
output = open('C:/Users/sufi/Desktop/info_edit.txt','w')
    
for line1,line2 in itertools.zip_longest(*[input]*2):
    if line1.startswith('...A_Z') and line2.startswith('..N'):
        output.write(line1 + '...B_Z 0.00\n'+ line2)
    elif line1.startswith('...KEY') and line2.startswith('...B_Z'):
        output.write(line1+'...A_Z 0.00\n'+line2)
    elif line1.startswith('...KEY') and line2.startswith('..N'):
        output.write(line1+'...A_Z 0.00\n...B_Z 0.00\n'+line2)
    else:
        output.write(line1+line2)

input.close()
output.close()
You need to parse the input one way or another. Here is a simple parser. It reads the input lines and for each line it calls a method line_event(). Initially this method simply outputs the line to the output file. If it meets a line starting with .CURVE, it changes the method to line_event_curve_header() which outputs the lines until it meets a line ..NE. In the meanwhile it detects if it meets a line starting with ...A_Z or ...B_Z. After the last line of the header, if ...A_Z or ...B_Z has not been found it changes the method to line_event_curve_body() which appends 0.00 to the lines until it meets a line starting a new object.

Here is the (untested) code
import re

class Parser:
    object_pattern = re.compile(r'^[.](CURVE|POINT|TEXT)')
    
    def __init__(self, infile, outfile):
        self.infile = infile
        self.outfile = outfile
        self.line_event = self.line_event_base
        
    def run(self):
        for line in self.infile:
            self.line_event(line)
        self.outfile.flush()
        
    def line_event_base(self, line):
        if line.startswith('.CURVE'):
            self.start_curve(line)
        else:
            self.outfile.write(line)
            
    def line_event_curve_header(self, line):
        if line.startswith('...A_Z') or line.startswith('...B_Z'):
            self.seen_ab_z = True
            self.output.write(line)
        elif line.rstrip('\n') == '..NE':
            if self.seen_ab_z:
                self.line_event = self.line_event_base
                self.output.write(line)
            else:
                self.output.write('...A_Z 0.00\n...B_Z 0.00\n..NEZ\n')
                self.line_event = self.line_event_curve_body
        else:
            self.output.write(line)
            
    def line_event_curve_body(self, line):
        if self.object_pattern.match(line):
            if line.startswith('.CURVE'):
                self.start_curve(line)
            else:
                self.line_event = self.line_event_base
                self.outfile.write(line)
        else:
            self.outfile.write(line.rstrip('\n'))
            self.outfile.write(' 0.00\n')
                
    def start_curve(self, line):
        self.line_event = self.line_event_curve_header
        self.seen_ab_z = False
        self.line_event(line)

if __name__ == '__main__':
    with open('C:/Users/sufi/Desktop/info.txt','r') as infile,\
            open('C:/Users/sufi/Desktop/info_edit.txt','w') as outfile:
        Parser(infile, outfile).run()