Python Forum
[Solved] Using readlines to read data file and sum columns
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Solved] Using readlines to read data file and sum columns
#1
Hey! I'm working with a data file with 5 columns. My aim is to read from a data file the first and last 20 lines and calculate the sums of first and last 20 values in 2nd and 3rd columns. However, this returns an error. The code now looks like this:

count=open("C:/../file.hst", "r")
first=open("C:/../file.hst", "r")
last=open("C:/../file.hst", "r") #Not sure why I have to open the same file three times to get readlines working, won't print if I just use one file (f.ex. "count" for all)

line_count = 0
for line in data:
    if line!="\n":
        line_count +=1

print("Number of lines: ", line_count) #Prints number of lines in file so I know what value to put in lastlines, this command works

firstlines = start.readlines()
firstlines = firstlines[0:19] #Reads first 20 lines
lastlines = end.readlines()
lastlines = lastlines[1192:1212] #Reads last 20 lines

#Aim to sum the first and last 20 values in second and third column

d1first = np.sum(firstlines[:,1])/20
d2first = np.sum(firstlines[:,2])/20
sumfirst = d1first+d2first
d1last = np.sum(lastlines[:,1])/20
d2last = np.sum(lastlines[:,2])/20
sumlast = d1last+d2last
print("Average of first 20 (both detectors) :", sumfirst)
print("Average of last 20 (both detectors) :", sumlast)


The error I get is this:

    d1first = np.sum(firstlines[:,1])

TypeError: list indices must be integers or slices, not tuple
I've been able to use np.sum(data[:,1]) without any problem if I read the whole file, but for some reason this command is not working when I pick specific rows. Should I somehow create a loop to sum until the 20th value (or last 20 values) in the file, or how can I fix this issue? Any help is greatly appreciated!
Reply
#2
It's a feeling but I guess you should first convert your data into float or int (your data are string) prior to use numpy

d1first = np.sum(float(firstlines[:,1]))
Reply
#3
(Jun-16-2021, 09:27 AM)paul18fr Wrote: It's a feeling but I guess you should first convert your data into float or int (your data are string) prior to use numpy

d1first = np.sum(float(firstlines[:,1]))

Hey, thanks for your reply! For some reason with both float() and int() the code returns the same exact error, so perhaps the commands are not properly working for readlines.

The data in a data file looks like this:
         0	         2	         0	    -15000	         0
        25	         2	         1	    -14975	         1
        50	         2	         2	    -14950	         3
        75	         2	         0	    -14925	         3
and so on for +1000 rows.

I now added a few lines in the code to print the "firstlines" and "lastlines", and I get this:

Number of lines:  1212
First 20 lines :
['         0\t         1\t         1\t    -15000\t         2\n', '        25\t         0\t         0\t    -14975\t         4\n', '        50\t         1\t         2\t    -14950\t         3\n', '        75\t         1\t         2\t    -14925\t         2\n', '       100\t         0\t         4\t    -14900\t         3\n', '       125\t         1\t         0\t    -14875\t         4\n', '       150\t         0\t         0\t    -14850\t         2\n', '       175\t         1\t         1\t    -14825\t         1\n', '       200\t         2\t         2\t    -14800\t         0\n', '       225\t         1\t         0\t    -14775\t         1\n', '       250\t         1\t         2\t    -14750\t         3\n', '       275\t         1\t         0\t    -14725\t         0\n', '       300\t         0\t         3\t    -14700\t         5\n', '       325\t         0\t         0\t    -14675\t         2\n', '       350\t         1\t         0\t    -14650\t         4\n', '       375\t         3\t         2\t    -14625\t         2\n', '       400\t         3\t         2\t    -14600\t         5\n', '       425\t         1\t         3\t    -14575\t         4\n', '       450\t         2\t         2\t    -14550\t         2\n', '       475\t         1\t         0\t    -14525\t         2\n']
Last 20 lines :
['     29525\t         0\t         0\t     14525\t         1\n', '     29550\t         0\t         0\t     14550\t         4\n', '     29575\t         0\t         0\t     14575\t         0\n', '     29600\t         0\t         0\t     14600\t         2\n', '     29625\t         0\t         1\t     14625\t         4\n', '     29650\t         0\t         0\t     14650\t         3\n', '     29675\t         0\t         0\t     14675\t         2\n', '     29700\t         0\t         0\t     14700\t         3\n', '     29725\t         0\t         1\t     14725\t         4\n', '     29750\t         1\t         0\t     14750\t         0\n', '     29775\t         0\t         0\t     14775\t         4\n', '     29800\t         0\t         0\t     14800\t         3\n', '     29825\t         0\t         0\t     14825\t         4\n', '     29850\t         0\t         0\t     14850\t         2\n', '     29875\t         0\t         0\t     14875\t         2\n', '     29900\t         0\t         0\t     14900\t         4\n', '     29925\t         1\t         1\t     14925\t         3\n', '     29950\t         0\t         1\t     14950\t         1\n', '     29975\t         0\t         0\t     14975\t         0\n', '     30000\t         0\t         0\t     15000\t         5\n']
The \t and \n are confusing me a little bit, could it be because of that the sum command does not work? For np.loadtxt np.sum works perfectly fine.
Reply
#4
I'm not a regex expert, but the following code might help you

import re

# line = " 0           2           0      -15000           0"
line = "50           2           2      -14950           3"
Values = re.split(r"\s?([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)", line)
print(f"1rst value : {Values[1]}")
print(f"2nd value : {Values[2]}")
print(f"3rd value : {Values[3]}")
print(f"4th value : {Values[4]}")
print(f"5th value : {Values[5]}")
Reply
#5
(Jun-16-2021, 12:13 PM)paul18fr Wrote: I'm not a regex expert, but the following code might help you

import re

# line = " 0           2           0      -15000           0"
line = "50           2           2      -14950           3"
Values = re.split(r"\s?([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)", line)
print(f"1rst value : {Values[1]}")
print(f"2nd value : {Values[2]}")
print(f"3rd value : {Values[3]}")
print(f"4th value : {Values[4]}")
print(f"5th value : {Values[5]}")

Got the issue resolved by using np.loadtxt and data[-1:1,1] type commands. Thanks for your help!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Help with to check an Input list data with a data read from an external source sacharyya 3 318 Mar-09-2024, 12:33 PM
Last Post: Pedroski55
Question [SOLVED] Correct way to convert file from cp-1252 to utf-8? Winfried 8 547 Feb-29-2024, 12:30 AM
Last Post: Winfried
  Create Choices from .ods file columns cspower 3 520 Dec-28-2023, 09:59 PM
Last Post: deanhystad
  Recommended way to read/create PDF file? Winfried 3 2,786 Nov-26-2023, 07:51 AM
Last Post: Pedroski55
  python Read each xlsx file and write it into csv with pipe delimiter mg24 4 1,314 Nov-09-2023, 10:56 AM
Last Post: mg24
  Create csv file with 4 columns for process mining thomaskissas33 3 695 Nov-06-2023, 09:36 PM
Last Post: deanhystad
  read file txt on my pc to telegram bot api Tupa 0 1,052 Jul-06-2023, 01:52 AM
Last Post: Tupa
  parse/read from file seperated by dots giovanne 5 1,044 Jun-26-2023, 12:26 PM
Last Post: DeaD_EyE
  Formatting a date time string read from a csv file DosAtPython 5 1,162 Jun-19-2023, 02:12 PM
Last Post: DosAtPython
  How do I read and write a binary file in Python? blackears 6 6,020 Jun-06-2023, 06:37 PM
Last Post: rajeshgk

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020