[Solved] Using readlines to read data file and sum columns

Laplace12 · (This post was last modified: Jun-16-2021, 12:57 PM by Laplace12.)

Hey! I'm working with a data file with 5 columns. My aim is to read from a data file the first and last 20 lines and calculate the sums of first and last 20 values in 2nd and 3rd columns. However, this returns an error. The code now looks like this:

count=open("C:/../file.hst", "r")
first=open("C:/../file.hst", "r")
last=open("C:/../file.hst", "r") #Not sure why I have to open the same file three times to get readlines working, won't print if I just use one file (f.ex. "count" for all)

line_count = 0
for line in data:
    if line!="\n":
        line_count +=1

print("Number of lines: ", line_count) #Prints number of lines in file so I know what value to put in lastlines, this command works

firstlines = start.readlines()
firstlines = firstlines[0:19] #Reads first 20 lines
lastlines = end.readlines()
lastlines = lastlines[1192:1212] #Reads last 20 lines

#Aim to sum the first and last 20 values in second and third column

d1first = np.sum(firstlines[:,1])/20
d2first = np.sum(firstlines[:,2])/20
sumfirst = d1first+d2first
d1last = np.sum(lastlines[:,1])/20
d2last = np.sum(lastlines[:,2])/20
sumlast = d1last+d2last
print("Average of first 20 (both detectors) :", sumfirst)
print("Average of last 20 (both detectors) :", sumlast)

The error I get is this:

    d1first = np.sum(firstlines[:,1])

TypeError: list indices must be integers or slices, not tuple

I've been able to use np.sum(data[:,1]) without any problem if I read the whole file, but for some reason this command is not working when I pick specific rows. Should I somehow create a loop to sum until the 20th value (or last 20 values) in the file, or how can I fix this issue? Any help is greatly appreciated!

paul18fr · Jun-16-2021, 09:27 AM

It's a feeling but I guess you should first convert your data into float or int (your data are string) prior to use numpy

d1first = np.sum(float(firstlines[:,1]))

Laplace12 · Jun-16-2021, 09:42 AM

(Jun-16-2021, 09:27 AM)paul18fr Wrote: It's a feeling but I guess you should first convert your data into float or int (your data are string) prior to use numpy
d1first = np.sum(float(firstlines[:,1]))

Hey, thanks for your reply! For some reason with both float() and int() the code returns the same exact error, so perhaps the commands are not properly working for readlines.

The data in a data file looks like this:

         0	         2	         0	    -15000	         0
        25	         2	         1	    -14975	         1
        50	         2	         2	    -14950	         3
        75	         2	         0	    -14925	         3

and so on for +1000 rows.

I now added a few lines in the code to print the "firstlines" and "lastlines", and I get this:

Number of lines:  1212
First 20 lines :
['         0\t         1\t         1\t    -15000\t         2\n', '        25\t         0\t         0\t    -14975\t         4\n', '        50\t         1\t         2\t    -14950\t         3\n', '        75\t         1\t         2\t    -14925\t         2\n', '       100\t         0\t         4\t    -14900\t         3\n', '       125\t         1\t         0\t    -14875\t         4\n', '       150\t         0\t         0\t    -14850\t         2\n', '       175\t         1\t         1\t    -14825\t         1\n', '       200\t         2\t         2\t    -14800\t         0\n', '       225\t         1\t         0\t    -14775\t         1\n', '       250\t         1\t         2\t    -14750\t         3\n', '       275\t         1\t         0\t    -14725\t         0\n', '       300\t         0\t         3\t    -14700\t         5\n', '       325\t         0\t         0\t    -14675\t         2\n', '       350\t         1\t         0\t    -14650\t         4\n', '       375\t         3\t         2\t    -14625\t         2\n', '       400\t         3\t         2\t    -14600\t         5\n', '       425\t         1\t         3\t    -14575\t         4\n', '       450\t         2\t         2\t    -14550\t         2\n', '       475\t         1\t         0\t    -14525\t         2\n']
Last 20 lines :
['     29525\t         0\t         0\t     14525\t         1\n', '     29550\t         0\t         0\t     14550\t         4\n', '     29575\t         0\t         0\t     14575\t         0\n', '     29600\t         0\t         0\t     14600\t         2\n', '     29625\t         0\t         1\t     14625\t         4\n', '     29650\t         0\t         0\t     14650\t         3\n', '     29675\t         0\t         0\t     14675\t         2\n', '     29700\t         0\t         0\t     14700\t         3\n', '     29725\t         0\t         1\t     14725\t         4\n', '     29750\t         1\t         0\t     14750\t         0\n', '     29775\t         0\t         0\t     14775\t         4\n', '     29800\t         0\t         0\t     14800\t         3\n', '     29825\t         0\t         0\t     14825\t         4\n', '     29850\t         0\t         0\t     14850\t         2\n', '     29875\t         0\t         0\t     14875\t         2\n', '     29900\t         0\t         0\t     14900\t         4\n', '     29925\t         1\t         1\t     14925\t         3\n', '     29950\t         0\t         1\t     14950\t         1\n', '     29975\t         0\t         0\t     14975\t         0\n', '     30000\t         0\t         0\t     15000\t         5\n']

The \t and \n are confusing me a little bit, could it be because of that the sum command does not work? For np.loadtxt np.sum works perfectly fine.

paul18fr · Jun-16-2021, 12:13 PM

I'm not a regex expert, but the following code might help you

import re

# line = " 0           2           0      -15000           0"
line = "50           2           2      -14950           3"
Values = re.split(r"\s?([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)", line)
print(f"1rst value : {Values[1]}")
print(f"2nd value : {Values[2]}")
print(f"3rd value : {Values[3]}")
print(f"4th value : {Values[4]}")
print(f"5th value : {Values[5]}")

Laplace12 · Jun-16-2021, 12:46 PM

(Jun-16-2021, 12:13 PM)paul18fr Wrote: I'm not a regex expert, but the following code might help you

import re

# line = " 0           2           0      -15000           0"
line = "50           2           2      -14950           3"
Values = re.split(r"\s?([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)", line)
print(f"1rst value : {Values[1]}")
print(f"2nd value : {Values[2]}")
print(f"3rd value : {Values[3]}")
print(f"4th value : {Values[4]}")
print(f"5th value : {Values[5]}")

Got the issue resolved by using np.loadtxt and data[-1:1,1] type commands. Thanks for your help!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[SOLVED] [datetime.strptime] ValueError: time data 'foo' does not match format 'bar'	Winfried	1	1,268	Jan-02-2025, 02:09 AM Last Post: lyly19
	How to read a file as binary or hex "string" so that I can do regex search?	tatahuft	3	1,166	Dec-19-2024, 11:57 AM Last Post: snippsat
	[SOLVED] [Linux] Write file and change owner?	Winfried	6	1,568	Oct-17-2024, 01:15 AM Last Post: Winfried
	Read TXT file in Pandas and save to Parquet	zinho	2	1,285	Sep-15-2024, 06:14 PM Last Post: zinho
	[solved] how to delete the 10 first lines of an ascii file	paul18fr	7	1,842	Aug-07-2024, 08:18 PM Last Post: Gribouillis
	Pycharm can't read file	Genericgamemaker	5	1,609	Jul-24-2024, 08:10 PM Last Post: deanhystad
	Python is unable to read file	Genericgamemaker	13	3,865	Jul-19-2024, 06:42 PM Last Post: snippsat
	Connecting to Remote Server to read contents of a file	ChaitanyaSharma	1	3,354	May-03-2024, 07:23 AM Last Post: Pedroski55
	Help with to check an Input list data with a data read from an external source	sacharyya	3	1,684	Mar-09-2024, 12:33 PM Last Post: Pedroski55
	[SOLVED] Correct way to convert file from cp-1252 to utf-8?	Winfried	8	10,182	Feb-29-2024, 12:30 AM Last Post: Winfried

[Solved] Using readlines to read data file and sum columns

User Panel Messages

Announcements