[Solved] Using readlines to read data file and sum columns

Laplace12 · (This post was last modified: Jun-16-2021, 12:57 PM by Laplace12.)

Hey! I'm working with a data file with 5 columns. My aim is to read from a data file the first and last 20 lines and calculate the sums of first and last 20 values in 2nd and 3rd columns. However, this returns an error. The code now looks like this:

count=open("C:/../file.hst", "r")
first=open("C:/../file.hst", "r")
last=open("C:/../file.hst", "r") #Not sure why I have to open the same file three times to get readlines working, won't print if I just use one file (f.ex. "count" for all)

line_count = 0
for line in data:
    if line!="\n":
        line_count +=1

print("Number of lines: ", line_count) #Prints number of lines in file so I know what value to put in lastlines, this command works

firstlines = start.readlines()
firstlines = firstlines[0:19] #Reads first 20 lines
lastlines = end.readlines()
lastlines = lastlines[1192:1212] #Reads last 20 lines

#Aim to sum the first and last 20 values in second and third column

d1first = np.sum(firstlines[:,1])/20
d2first = np.sum(firstlines[:,2])/20
sumfirst = d1first+d2first
d1last = np.sum(lastlines[:,1])/20
d2last = np.sum(lastlines[:,2])/20
sumlast = d1last+d2last
print("Average of first 20 (both detectors) :", sumfirst)
print("Average of last 20 (both detectors) :", sumlast)

The error I get is this:

    d1first = np.sum(firstlines[:,1])

TypeError: list indices must be integers or slices, not tuple

I've been able to use np.sum(data[:,1]) without any problem if I read the whole file, but for some reason this command is not working when I pick specific rows. Should I somehow create a loop to sum until the 20th value (or last 20 values) in the file, or how can I fix this issue? Any help is greatly appreciated!

paul18fr · Jun-16-2021, 09:27 AM

It's a feeling but I guess you should first convert your data into float or int (your data are string) prior to use numpy

d1first = np.sum(float(firstlines[:,1]))

Laplace12 · Jun-16-2021, 09:42 AM

(Jun-16-2021, 09:27 AM)paul18fr Wrote: It's a feeling but I guess you should first convert your data into float or int (your data are string) prior to use numpy
d1first = np.sum(float(firstlines[:,1]))

Hey, thanks for your reply! For some reason with both float() and int() the code returns the same exact error, so perhaps the commands are not properly working for readlines.

The data in a data file looks like this:

         0	         2	         0	    -15000	         0
        25	         2	         1	    -14975	         1
        50	         2	         2	    -14950	         3
        75	         2	         0	    -14925	         3

and so on for +1000 rows.

I now added a few lines in the code to print the "firstlines" and "lastlines", and I get this:

Number of lines:  1212
First 20 lines :
['         0\t         1\t         1\t    -15000\t         2\n', '        25\t         0\t         0\t    -14975\t         4\n', '        50\t         1\t         2\t    -14950\t         3\n', '        75\t         1\t         2\t    -14925\t         2\n', '       100\t         0\t         4\t    -14900\t         3\n', '       125\t         1\t         0\t    -14875\t         4\n', '       150\t         0\t         0\t    -14850\t         2\n', '       175\t         1\t         1\t    -14825\t         1\n', '       200\t         2\t         2\t    -14800\t         0\n', '       225\t         1\t         0\t    -14775\t         1\n', '       250\t         1\t         2\t    -14750\t         3\n', '       275\t         1\t         0\t    -14725\t         0\n', '       300\t         0\t         3\t    -14700\t         5\n', '       325\t         0\t         0\t    -14675\t         2\n', '       350\t         1\t         0\t    -14650\t         4\n', '       375\t         3\t         2\t    -14625\t         2\n', '       400\t         3\t         2\t    -14600\t         5\n', '       425\t         1\t         3\t    -14575\t         4\n', '       450\t         2\t         2\t    -14550\t         2\n', '       475\t         1\t         0\t    -14525\t         2\n']
Last 20 lines :
['     29525\t         0\t         0\t     14525\t         1\n', '     29550\t         0\t         0\t     14550\t         4\n', '     29575\t         0\t         0\t     14575\t         0\n', '     29600\t         0\t         0\t     14600\t         2\n', '     29625\t         0\t         1\t     14625\t         4\n', '     29650\t         0\t         0\t     14650\t         3\n', '     29675\t         0\t         0\t     14675\t         2\n', '     29700\t         0\t         0\t     14700\t         3\n', '     29725\t         0\t         1\t     14725\t         4\n', '     29750\t         1\t         0\t     14750\t         0\n', '     29775\t         0\t         0\t     14775\t         4\n', '     29800\t         0\t         0\t     14800\t         3\n', '     29825\t         0\t         0\t     14825\t         4\n', '     29850\t         0\t         0\t     14850\t         2\n', '     29875\t         0\t         0\t     14875\t         2\n', '     29900\t         0\t         0\t     14900\t         4\n', '     29925\t         1\t         1\t     14925\t         3\n', '     29950\t         0\t         1\t     14950\t         1\n', '     29975\t         0\t         0\t     14975\t         0\n', '     30000\t         0\t         0\t     15000\t         5\n']

The \t and \n are confusing me a little bit, could it be because of that the sum command does not work? For np.loadtxt np.sum works perfectly fine.

paul18fr · Jun-16-2021, 12:13 PM

I'm not a regex expert, but the following code might help you

import re

# line = " 0           2           0      -15000           0"
line = "50           2           2      -14950           3"
Values = re.split(r"\s?([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)", line)
print(f"1rst value : {Values[1]}")
print(f"2nd value : {Values[2]}")
print(f"3rd value : {Values[3]}")
print(f"4th value : {Values[4]}")
print(f"5th value : {Values[5]}")

Laplace12 · Jun-16-2021, 12:46 PM

(Jun-16-2021, 12:13 PM)paul18fr Wrote: I'm not a regex expert, but the following code might help you

import re

# line = " 0           2           0      -15000           0"
line = "50           2           2      -14950           3"
Values = re.split(r"\s?([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)", line)
print(f"1rst value : {Values[1]}")
print(f"2nd value : {Values[2]}")
print(f"3rd value : {Values[3]}")
print(f"4th value : {Values[4]}")
print(f"5th value : {Values[5]}")

Got the issue resolved by using np.loadtxt and data[-1:1,1] type commands. Thanks for your help!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Help with to check an Input list data with a data read from an external source	sacharyya	3	403	Mar-09-2024, 12:33 PM Last Post: Pedroski55
	[SOLVED] Correct way to convert file from cp-1252 to utf-8?	Winfried	8	800	Feb-29-2024, 12:30 AM Last Post: Winfried
	Create Choices from .ods file columns	cspower	3	584	Dec-28-2023, 09:59 PM Last Post: deanhystad
	Recommended way to read/create PDF file?	Winfried	3	2,869	Nov-26-2023, 07:51 AM Last Post: Pedroski55
	python Read each xlsx file and write it into csv with pipe delimiter	mg24	4	1,429	Nov-09-2023, 10:56 AM Last Post: mg24
	Create csv file with 4 columns for process mining	thomaskissas33	3	745	Nov-06-2023, 09:36 PM Last Post: deanhystad
	read file txt on my pc to telegram bot api	Tupa	0	1,106	Jul-06-2023, 01:52 AM Last Post: Tupa
	parse/read from file seperated by dots	giovanne	5	1,105	Jun-26-2023, 12:26 PM Last Post: DeaD_EyE
	Formatting a date time string read from a csv file	DosAtPython	5	1,253	Jun-19-2023, 02:12 PM Last Post: DosAtPython
	How do I read and write a binary file in Python?	blackears	6	6,506	Jun-06-2023, 06:37 PM Last Post: rajeshgk

[Solved] Using readlines to read data file and sum columns

User Panel Messages

Announcements