[Solved] Reading every nth line into a column from txt file

[Solved] Reading every nth line into a column from txt file - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: [Solved] Reading every nth line into a column from txt file (/thread-34120.html)

[Solved] Reading every nth line into a column from txt file - Laplace12 - Jun-28-2021

Hey!

I have a text file that I want to sort out. I've coded this and tried dataframe, but that only prints the last line. The code I have now is this, producing the txt file:

with open(output) as file, open(out, 'w') as file_out:
    for line in file:
        if '2101' in line and found:
            a = line.split()
            print(a[1], file=file_out)
        elif 'Lifetimes' in line and found:
            b = line.split()
            print(b[3], b[4], b[5], file=file_out)
        elif 'Std deviations' in line and found:
#            print(c[3:6])
            c = line
            print(deviations(c), file=file_out)
        elif 'Intensities' in line and found:
            d = line.split()
            print(d[3], d[4], d[5], file=file_out)
        elif 'Time-zero' in line and found:
            e = line.split()
            print(e[4], file=file_out)
        else:
            found = True

#This is what I tried so far
with open(out) as a:
    cpt = 0
    for line in a:
        cpt += 1
        if cpt == 8:
            print(line)
            cpt = 0

The 'out' file is like this:

Number
Value1
Deviation
Value2
Deviation
Value3
Deviation
Number
Value1
Deviation
...

So basically the file is now a list I want to sort so that Lifetimes are all in one column, Value1 in the next, then Deviation, Value2, its deviation so on; I want every 8th value in the same column, and I'm guessing this could somehow be done by creating a loop that prints, skips 7 values and prints the next so that the start number could be changed from 1-7. I need to save the results in another file, so perhaps it'd be easier to code the columns in the 'out' file already without creating so many files, but for now it's enough to get the data sorted properly, so even the simplest code to produce columns from the txt file works!

RE: Reading every nth line into a column from txt file - snippsat - Jun-28-2021

How dos the original file look it's properly a better way,but can not advice anything without a sample of original file.

RE: Reading every nth line into a column from txt file - Laplace12 - Jun-28-2021

(Jun-28-2021, 10:54 AM)snippsat Wrote: How dos the original file look it's properly a better way,but can not advice anything without a sample of original file.

Hey, it looks like this:

#0
0.4000 0.1250 2.0446
['Fixed', 'Fixed', '0.0339']
69.2721 9.6726 21.0553
['1.0359', '0.8128', '0.4063']
41.5603
['0.0588', ' ', ' ']
#1
0.4000 0.1250 2.0714
['Fixed', 'Fixed', '0.0344']
70.0338 9.0952 20.8710
['1.0308', '0.8135', '0.4009']
41.5853
['0.0593', ' ', ' ']
#2
0.4000 0.1250 2.0568
['Fixed', 'Fixed', '0.0333']
69.5963 8.7445 21.6592
['1.0411', '0.8177', '0.4072']
41.5541
['0.0603', ' ', ' ']
#3
0.4000 0.1250 2.0321
['Fixed', 'Fixed', '0.0329']
...

With 490 lines in total.

RE: Reading every nth line into a column from txt file - snippsat - Jun-28-2021

What is the output you want from this?
I can not see why you look for Lifetime,Std deviations...ect in this.
Are you making this file?
When put a Python list ['Fixed', 'Fixed', '0.0339'] in a text file it lose all it's meaning.
Have to parse it back or could done something else like taken out values(eg CSV way) then save a list to a test file.
If you no control of the text file then have to parse it to what you want.

RE: Reading every nth line into a column from txt file - Laplace12 - Jun-28-2021

(Jun-28-2021, 01:20 PM)snippsat Wrote: What is the output you want from this?
I can not see why you look for Lifetime,Std deviations...ect in this.
Are you making this file?
When put a Python list ['Fixed', 'Fixed', '0.0339'] in a text file it lose all it's meaning.
Have to parse it back or could done something else like taken out values(eg CSV way) then save a list to a test file.
If you no control of the text file then have to parse it to what you want.

Alright, I must've explained this quite badly, let me try again! The first part of the code (picking Lifetimes etc.) is just sorting out a file (called output) that looks like this:

CA50_40_ref_data2101_E04_spec0-70 #0                                    
             Lifetimes (ns)   :    0.4000    0.1250    2.0446
             Std deviations   :     Fixed     Fixed    0.0339 
             Intensities (%)  :   69.2721    9.6726   21.0553
             Std deviations   :    1.0359    0.8128    0.4063 
Time-zero    Channel number   :   41.5603
             Std deviations   :    0.0588 
CA50_40_ref_data2101_E04_spec0-70 #1                                    
             Lifetimes (ns)   :    0.4000    0.1250    2.0714
             Std deviations   :     Fixed     Fixed    0.0344 
             Intensities (%)  :   70.0338    9.0952   20.8710
             Std deviations   :    1.0308    0.8135    0.4009 
Time-zero    Channel number   :   41.5853
             Std deviations   :    0.0593 
CA50_40_ref_data2101_E04_spec0-70 #2                                    
             Lifetimes (ns)   :    0.4000    0.1250    2.0568
             Std deviations   :     Fixed     Fixed    0.0333 
             Intensities (%)  :   69.5963    8.7445   21.6592
             Std deviations   :    1.0411    0.8177    0.4072 
Time-zero    Channel number   :   41.5541
             Std deviations   :    0.0603 
CA50_40_ref_data2101_E04_spec0-70 #3                                    
             Lifetimes (ns)   :    0.4000    0.1250    2.0321
             Std deviations   :     Fixed     Fixed    0.0329 
             Intensities (%)  :   70.4228    8.0614   21.5158
             Std deviations   :    1.0497    0.8219    0.4105 
Time-zero    Channel number   :   41.4507
             Std deviations   :    0.0604 
CA50_40_ref_data2101_E04_spec0-70 #4                                    
             Lifetimes (ns)   :    0.4000    0.1250    2.0513
             Std deviations   :     Fixed     Fixed    0.0331 
             Intensities (%)  :   67.2025   11.0731   21.7244
             Std deviations   :    1.0204    0.7976    0.4057 
Time-zero    Channel number   :   41.6253
             Std deviations   :    0.0579 
CA50_40_ref_data2101_E04_spec0-70 #5                                  
...

into this (file called out):

#0
0.4000 0.1250 2.0446
['Fixed', 'Fixed', '0.0339']
69.2721 9.6726 21.0553
['1.0359', '0.8128', '0.4063']
41.5603
['0.0588', ' ', ' ']
#1
0.4000 0.1250 2.0714
['Fixed', 'Fixed', '0.0344']
70.0338 9.0952 20.8710
['1.0308', '0.8135', '0.4009']
41.5853
['0.0593', ' ', ' ']
#2
0.4000 0.1250 2.0568
['Fixed', 'Fixed', '0.0333']
69.5963 8.7445 21.6592
['1.0411', '0.8177', '0.4072']
41.5541
['0.0603', ' ', ' ']
#3
0.4000 0.1250 2.0321
['Fixed', 'Fixed', '0.0329']
70.4228 8.0614 21.5158
['1.0497', '0.8219', '0.4105']
41.4507
['0.0604', ' ', ' ']
#4
0.4000 0.1250 2.0513
['Fixed', 'Fixed', '0.0331']
67.2025 11.0731 21.7244
['1.0204', '0.7976', '0.4057']

So the first loop was needed to extract the necessary information from the first file, and now I am trying to get the 'out' file above in this form for easier comparison:

Dataset Lifetimes            Std deviations               Intensities            Std deviations                 Time-zero Std deviation
#0      0.4000 0.1250 2.0446 ['Fixed', 'Fixed', '0.0339'] 69.2721 9.6726 21.0553 ['1.0359', '0.8128', '0.4063'] 41.5603   ['0.0588', ' ', ' ']
#1      0.4000 0.1250 2.0714 ['Fixed', 'Fixed', '0.0344'] 70.0338 9.0952 20.8710 ['1.0308', '0.8135', '0.4009'] 41.5853   ['0.0593', ' ', ' ']
...

So basically I'm just trying to sort out the 'out' file into columns with every eight value in the same column.

RE: Reading every nth line into a column from txt file - Yoriz - Jun-28-2021

output

Output:CA50_40_ref_data2101_E04_spec0-70 #0
             Lifetimes (ns)   :    0.4000    0.1250    2.0446
             Std deviations   :     Fixed     Fixed    0.0339
             Intensities (%)  :   69.2721    9.6726   21.0553
             Std deviations   :    1.0359    0.8128    0.4063
Time-zero    Channel number   :   41.5603
             Std deviations   :    0.0588 
CA50_40_ref_data2101_E04_spec0-70 #1
             Lifetimes (ns)   :    0.4000    0.1250    2.0714
             Std deviations   :     Fixed     Fixed    0.0344
             Intensities (%)  :   70.0338    9.0952   20.8710
             Std deviations   :    1.0308    0.8135    0.4009
Time-zero    Channel number   :   41.5853
             Std deviations   :    0.0593
CA50_40_ref_data2101_E04_spec0-70 #2
             Lifetimes (ns)   :    0.4000    0.1250    2.0568
             Std deviations   :     Fixed     Fixed    0.0333
             Intensities (%)  :   69.5963    8.7445   21.6592
             Std deviations   :    1.0411    0.8177    0.4072
Time-zero    Channel number   :   41.5541
             Std deviations   :    0.0603
CA50_40_ref_data2101_E04_spec0-70 #3
             Lifetimes (ns)   :    0.4000    0.1250    2.0321
             Std deviations   :     Fixed     Fixed    0.0329
             Intensities (%)  :   70.4228    8.0614   21.5158
             Std deviations   :    1.0497    0.8219    0.4105
Time-zero    Channel number   :   41.4507
             Std deviations   :    0.0604
CA50_40_ref_data2101_E04_spec0-70 #4
             Lifetimes (ns)   :    0.4000    0.1250    2.0513
             Std deviations   :     Fixed     Fixed    0.0331
             Intensities (%)  :   67.2025   11.0731   21.7244
             Std deviations   :    1.0204    0.7976    0.4057
Time-zero    Channel number   :   41.6253
             Std deviations   :    0.0579

You may need to add some error checking

from itertools import zip_longest

HEADER = ('Dataset Lifetimes            Std deviations                    '
          'Intensities             Std deviations                    '
          'Time-zero Std deviation\n')


def grouper(iterable, n, fillvalue=''):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)


class Dataset:
    def __init__(self, line: str) -> None:
        self.data = line.strip().split()[1]

    def __repr__(self) -> str:
        return f'{self.data:7}'


class LifeTimes:
    def __init__(self, line: str) -> None:
        self.data = line.strip().split()[3:6]

    def __repr__(self) -> str:
        return ' '.join(self.data)


class StdDeviations:
    def __init__(self, line: str) -> None:
        split_line = line.strip().split()
        self.data = [split_line[num] if split_line[num:]
                     else '' for num in range(3, 6)]

    def __repr__(self) -> str:
        return f"['{self.data[0]:^7}', '{self.data[1]:^7}', '{self.data[2]:^7}']"


class Intensities:
    def __init__(self, line: str) -> None:
        self.data = line.strip().split()[3:6]

    def __repr__(self) -> str:
        return f'{self.data[0]:>7} {self.data[1]:>7} {self.data[2]:>7}'


class TimeZero:
    def __init__(self, line: str) -> None:
        self.data = line.strip().split()[4]

    def __repr__(self) -> str:
        return f'{self.data:9}'


class Block:
    def __init__(self, dataset, lifetimes, std_deviations, intensities,
                 std_deviations2, time_zero, std_deviations3) -> None:
        self.dataset = Dataset(dataset)
        self.lifetimes = LifeTimes(lifetimes)
        self.std_deviations = StdDeviations(std_deviations)
        self.intensities = Intensities(intensities)
        self.std_deviations2 = StdDeviations(std_deviations2)
        self.time_zero = TimeZero(time_zero)
        self.std_deviations3 = StdDeviations(std_deviations3)

    def __repr__(self) -> str:
        return (f'{self.dataset} {self.lifetimes} {self.std_deviations}'
                f' {self.intensities} {self.std_deviations2}'
                f' {self.time_zero} {self.std_deviations3}\n')


with open('output') as file_in, open('out', 'w') as file_out:
    file_out.write(HEADER)
    for group in grouper(file_in, 7):
        file_out.write(str(Block(*group)))

out

Output:Dataset Lifetimes            Std deviations                    Intensities             Std deviations                    Time-zero Std deviation
#0      0.4000 0.1250 2.0446 [' Fixed ', ' Fixed ', '0.0339 '] 69.2721  9.6726 21.0553 ['1.0359 ', '0.8128 ', '0.4063 '] 41.5603   ['0.0588 ', '       ', '       ']
#1      0.4000 0.1250 2.0714 [' Fixed ', ' Fixed ', '0.0344 '] 70.0338  9.0952 20.8710 ['1.0308 ', '0.8135 ', '0.4009 '] 41.5853   ['0.0593 ', '       ', '       ']
#2      0.4000 0.1250 2.0568 [' Fixed ', ' Fixed ', '0.0333 '] 69.5963  8.7445 21.6592 ['1.0411 ', '0.8177 ', '0.4072 '] 41.5541   ['0.0603 ', '       ', '       ']
#3      0.4000 0.1250 2.0321 [' Fixed ', ' Fixed ', '0.0329 '] 70.4228  8.0614 21.5158 ['1.0497 ', '0.8219 ', '0.4105 '] 41.4507   ['0.0604 ', '       ', '       ']
#4      0.4000 0.1250 2.0513 [' Fixed ', ' Fixed ', '0.0331 '] 67.2025 11.0731 21.7244 ['1.0204 ', '0.7976 ', '0.4057 '] 41.6253   ['0.0579 ', '       ', '       ']

RE: Reading every nth line into a column from txt file - snippsat - Jun-29-2021

If look at data so should it be turn around this is called transpose() if want data into Pandas for calculation, plot..ect.
If just want display data then can Yoriz method work.

To give example,just using first record.

record = {}
with open('ca_data.txt') as f:
    header = next(f)
    for line in f:
        line = line.strip()
        line = line.replace('Time-zero    ', '')
        line = line.split(':')
        line_1 = line[0].strip()
        line_2 = ''.join(line[1:])
        record[line_1] = line_2.split()

# Read like this so it fill in empty values with None
df = pd.DataFrame.from_dict(record, orient='index')
print(df)

Output:Lifetimes (ns)    0.4000  0.1250   2.0446
Std deviations    0.0588    None     None
Intensities (%)  69.2721  9.6726  21.0553
Channel number   41.5603    None     None

No can use transpose(),then it will a useful DataFrame.

>>> df = df.transpose()
>>> df
  Lifetimes (ns) Std deviations Intensities (%) Channel number
0         0.4000         0.0588         69.2721        41.5603
1         0.1250           None          9.6726           None
2         2.0446           None         21.0553           None

RE: Reading every nth line into a column from txt file - Laplace12 - Jun-29-2021

(Jun-28-2021, 11:20 PM)Yoriz Wrote: output

Output:CA50_40_ref_data2101_E04_spec0-70 #0
             Lifetimes (ns)   :    0.4000    0.1250    2.0446
             Std deviations   :     Fixed     Fixed    0.0339
             Intensities (%)  :   69.2721    9.6726   21.0553
             Std deviations   :    1.0359    0.8128    0.4063
Time-zero    Channel number   :   41.5603
             Std deviations   :    0.0588 
CA50_40_ref_data2101_E04_spec0-70 #1
             Lifetimes (ns)   :    0.4000    0.1250    2.0714
             Std deviations   :     Fixed     Fixed    0.0344
             Intensities (%)  :   70.0338    9.0952   20.8710
             Std deviations   :    1.0308    0.8135    0.4009
Time-zero    Channel number   :   41.5853
             Std deviations   :    0.0593
CA50_40_ref_data2101_E04_spec0-70 #2
             Lifetimes (ns)   :    0.4000    0.1250    2.0568
             Std deviations   :     Fixed     Fixed    0.0333
             Intensities (%)  :   69.5963    8.7445   21.6592
             Std deviations   :    1.0411    0.8177    0.4072
Time-zero    Channel number   :   41.5541
             Std deviations   :    0.0603
CA50_40_ref_data2101_E04_spec0-70 #3
             Lifetimes (ns)   :    0.4000    0.1250    2.0321
             Std deviations   :     Fixed     Fixed    0.0329
             Intensities (%)  :   70.4228    8.0614   21.5158
             Std deviations   :    1.0497    0.8219    0.4105
Time-zero    Channel number   :   41.4507
             Std deviations   :    0.0604
CA50_40_ref_data2101_E04_spec0-70 #4
             Lifetimes (ns)   :    0.4000    0.1250    2.0513
             Std deviations   :     Fixed     Fixed    0.0331
             Intensities (%)  :   67.2025   11.0731   21.7244
             Std deviations   :    1.0204    0.7976    0.4057
Time-zero    Channel number   :   41.6253
             Std deviations   :    0.0579

You may need to add some error checking

from itertools import zip_longest

HEADER = ('Dataset Lifetimes            Std deviations                    '
          'Intensities             Std deviations                    '
          'Time-zero Std deviation\n')


def grouper(iterable, n, fillvalue=''):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)


class Dataset:
    def __init__(self, line: str) -> None:
        self.data = line.strip().split()[1]

    def __repr__(self) -> str:
        return f'{self.data:7}'


class LifeTimes:
    def __init__(self, line: str) -> None:
        self.data = line.strip().split()[3:6]

    def __repr__(self) -> str:
        return ' '.join(self.data)


class StdDeviations:
    def __init__(self, line: str) -> None:
        split_line = line.strip().split()
        self.data = [split_line[num] if split_line[num:]
                     else '' for num in range(3, 6)]

    def __repr__(self) -> str:
        return f"['{self.data[0]:^7}', '{self.data[1]:^7}', '{self.data[2]:^7}']"


class Intensities:
    def __init__(self, line: str) -> None:
        self.data = line.strip().split()[3:6]

    def __repr__(self) -> str:
        return f'{self.data[0]:>7} {self.data[1]:>7} {self.data[2]:>7}'


class TimeZero:
    def __init__(self, line: str) -> None:
        self.data = line.strip().split()[4]

    def __repr__(self) -> str:
        return f'{self.data:9}'


class Block:
    def __init__(self, dataset, lifetimes, std_deviations, intensities,
                 std_deviations2, time_zero, std_deviations3) -> None:
        self.dataset = Dataset(dataset)
        self.lifetimes = LifeTimes(lifetimes)
        self.std_deviations = StdDeviations(std_deviations)
        self.intensities = Intensities(intensities)
        self.std_deviations2 = StdDeviations(std_deviations2)
        self.time_zero = TimeZero(time_zero)
        self.std_deviations3 = StdDeviations(std_deviations3)

    def __repr__(self) -> str:
        return (f'{self.dataset} {self.lifetimes} {self.std_deviations}'
                f' {self.intensities} {self.std_deviations2}'
                f' {self.time_zero} {self.std_deviations3}\n')


with open('output') as file_in, open('out', 'w') as file_out:
    file_out.write(HEADER)
    for group in grouper(file_in, 7):
        file_out.write(str(Block(*group)))

out

Output:Dataset Lifetimes            Std deviations                    Intensities             Std deviations                    Time-zero Std deviation
#0      0.4000 0.1250 2.0446 [' Fixed ', ' Fixed ', '0.0339 '] 69.2721  9.6726 21.0553 ['1.0359 ', '0.8128 ', '0.4063 '] 41.5603   ['0.0588 ', '       ', '       ']
#1      0.4000 0.1250 2.0714 [' Fixed ', ' Fixed ', '0.0344 '] 70.0338  9.0952 20.8710 ['1.0308 ', '0.8135 ', '0.4009 '] 41.5853   ['0.0593 ', '       ', '       ']
#2      0.4000 0.1250 2.0568 [' Fixed ', ' Fixed ', '0.0333 '] 69.5963  8.7445 21.6592 ['1.0411 ', '0.8177 ', '0.4072 '] 41.5541   ['0.0603 ', '       ', '       ']
#3      0.4000 0.1250 2.0321 [' Fixed ', ' Fixed ', '0.0329 '] 70.4228  8.0614 21.5158 ['1.0497 ', '0.8219 ', '0.4105 '] 41.4507   ['0.0604 ', '       ', '       ']
#4      0.4000 0.1250 2.0513 [' Fixed ', ' Fixed ', '0.0331 '] 67.2025 11.0731 21.7244 ['1.0204 ', '0.7976 ', '0.4057 '] 41.6253   ['0.0579 ', '       ', '       ']

Brilliant, this works perfectly! Using def/class commands is not familiar to me at all but I have to take a much closer look at that, seems very useful for my purposes - and it's a huge plus having reduced the amount of files the code creates. Definitely something to try and learn better. Thank you!

(Jun-29-2021, 12:39 AM)snippsat Wrote: If look at data so should it be turn around this is called transpose() if want data into Pandas for calculation, plot..ect.
If just want display data then can Yoriz method work.

To give example,just using first record.

record = {}
with open('ca_data.txt') as f:
    header = next(f)
    for line in f:
        line = line.strip()
        line = line.replace('Time-zero    ', '')
        line = line.split(':')
        line_1 = line[0].strip()
        line_2 = ''.join(line[1:])
        record[line_1] = line_2.split()

# Read like this so it fill in empty values with None
df = pd.DataFrame.from_dict(record, orient='index')
print(df)

Output:Lifetimes (ns)    0.4000  0.1250   2.0446
Std deviations    0.0588    None     None
Intensities (%)  69.2721  9.6726  21.0553
Channel number   41.5603    None     None

No can use transpose(),then it will a useful DataFrame.

>>> df = df.transpose()
>>> df
  Lifetimes (ns) Std deviations Intensities (%) Channel number
0         0.4000         0.0588         69.2721        41.5603
1         0.1250           None          9.6726           None
2         2.0446           None         21.0553           None

Big thanks to you as well, I'll take a look at this method!