Python Forum

Full Version: [Solved] Using readlines to read data file and sum columns
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hey! I'm working with a data file with 5 columns. My aim is to read from a data file the first and last 20 lines and calculate the sums of first and last 20 values in 2nd and 3rd columns. However, this returns an error. The code now looks like this:

count=open("C:/../file.hst", "r")
first=open("C:/../file.hst", "r")
last=open("C:/../file.hst", "r") #Not sure why I have to open the same file three times to get readlines working, won't print if I just use one file (f.ex. "count" for all)

line_count = 0
for line in data:
    if line!="\n":
        line_count +=1

print("Number of lines: ", line_count) #Prints number of lines in file so I know what value to put in lastlines, this command works

firstlines = start.readlines()
firstlines = firstlines[0:19] #Reads first 20 lines
lastlines = end.readlines()
lastlines = lastlines[1192:1212] #Reads last 20 lines

#Aim to sum the first and last 20 values in second and third column

d1first = np.sum(firstlines[:,1])/20
d2first = np.sum(firstlines[:,2])/20
sumfirst = d1first+d2first
d1last = np.sum(lastlines[:,1])/20
d2last = np.sum(lastlines[:,2])/20
sumlast = d1last+d2last
print("Average of first 20 (both detectors) :", sumfirst)
print("Average of last 20 (both detectors) :", sumlast)


The error I get is this:

    d1first = np.sum(firstlines[:,1])

TypeError: list indices must be integers or slices, not tuple
I've been able to use np.sum(data[:,1]) without any problem if I read the whole file, but for some reason this command is not working when I pick specific rows. Should I somehow create a loop to sum until the 20th value (or last 20 values) in the file, or how can I fix this issue? Any help is greatly appreciated!
It's a feeling but I guess you should first convert your data into float or int (your data are string) prior to use numpy

d1first = np.sum(float(firstlines[:,1]))
(Jun-16-2021, 09:27 AM)paul18fr Wrote: [ -> ]It's a feeling but I guess you should first convert your data into float or int (your data are string) prior to use numpy

d1first = np.sum(float(firstlines[:,1]))

Hey, thanks for your reply! For some reason with both float() and int() the code returns the same exact error, so perhaps the commands are not properly working for readlines.

The data in a data file looks like this:
         0	         2	         0	    -15000	         0
        25	         2	         1	    -14975	         1
        50	         2	         2	    -14950	         3
        75	         2	         0	    -14925	         3
and so on for +1000 rows.

I now added a few lines in the code to print the "firstlines" and "lastlines", and I get this:

Number of lines:  1212
First 20 lines :
['         0\t         1\t         1\t    -15000\t         2\n', '        25\t         0\t         0\t    -14975\t         4\n', '        50\t         1\t         2\t    -14950\t         3\n', '        75\t         1\t         2\t    -14925\t         2\n', '       100\t         0\t         4\t    -14900\t         3\n', '       125\t         1\t         0\t    -14875\t         4\n', '       150\t         0\t         0\t    -14850\t         2\n', '       175\t         1\t         1\t    -14825\t         1\n', '       200\t         2\t         2\t    -14800\t         0\n', '       225\t         1\t         0\t    -14775\t         1\n', '       250\t         1\t         2\t    -14750\t         3\n', '       275\t         1\t         0\t    -14725\t         0\n', '       300\t         0\t         3\t    -14700\t         5\n', '       325\t         0\t         0\t    -14675\t         2\n', '       350\t         1\t         0\t    -14650\t         4\n', '       375\t         3\t         2\t    -14625\t         2\n', '       400\t         3\t         2\t    -14600\t         5\n', '       425\t         1\t         3\t    -14575\t         4\n', '       450\t         2\t         2\t    -14550\t         2\n', '       475\t         1\t         0\t    -14525\t         2\n']
Last 20 lines :
['     29525\t         0\t         0\t     14525\t         1\n', '     29550\t         0\t         0\t     14550\t         4\n', '     29575\t         0\t         0\t     14575\t         0\n', '     29600\t         0\t         0\t     14600\t         2\n', '     29625\t         0\t         1\t     14625\t         4\n', '     29650\t         0\t         0\t     14650\t         3\n', '     29675\t         0\t         0\t     14675\t         2\n', '     29700\t         0\t         0\t     14700\t         3\n', '     29725\t         0\t         1\t     14725\t         4\n', '     29750\t         1\t         0\t     14750\t         0\n', '     29775\t         0\t         0\t     14775\t         4\n', '     29800\t         0\t         0\t     14800\t         3\n', '     29825\t         0\t         0\t     14825\t         4\n', '     29850\t         0\t         0\t     14850\t         2\n', '     29875\t         0\t         0\t     14875\t         2\n', '     29900\t         0\t         0\t     14900\t         4\n', '     29925\t         1\t         1\t     14925\t         3\n', '     29950\t         0\t         1\t     14950\t         1\n', '     29975\t         0\t         0\t     14975\t         0\n', '     30000\t         0\t         0\t     15000\t         5\n']
The \t and \n are confusing me a little bit, could it be because of that the sum command does not work? For np.loadtxt np.sum works perfectly fine.
I'm not a regex expert, but the following code might help you

import re

# line = " 0           2           0      -15000           0"
line = "50           2           2      -14950           3"
Values = re.split(r"\s?([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)", line)
print(f"1rst value : {Values[1]}")
print(f"2nd value : {Values[2]}")
print(f"3rd value : {Values[3]}")
print(f"4th value : {Values[4]}")
print(f"5th value : {Values[5]}")
(Jun-16-2021, 12:13 PM)paul18fr Wrote: [ -> ]I'm not a regex expert, but the following code might help you

import re

# line = " 0           2           0      -15000           0"
line = "50           2           2      -14950           3"
Values = re.split(r"\s?([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)", line)
print(f"1rst value : {Values[1]}")
print(f"2nd value : {Values[2]}")
print(f"3rd value : {Values[3]}")
print(f"4th value : {Values[4]}")
print(f"5th value : {Values[5]}")

Got the issue resolved by using np.loadtxt and data[-1:1,1] type commands. Thanks for your help!