Python Forum
[Solved] Using readlines to read data file and sum columns
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Solved] Using readlines to read data file and sum columns
#1
Hey! I'm working with a data file with 5 columns. My aim is to read from a data file the first and last 20 lines and calculate the sums of first and last 20 values in 2nd and 3rd columns. However, this returns an error. The code now looks like this:

count=open("C:/../file.hst", "r")
first=open("C:/../file.hst", "r")
last=open("C:/../file.hst", "r") #Not sure why I have to open the same file three times to get readlines working, won't print if I just use one file (f.ex. "count" for all)

line_count = 0
for line in data:
    if line!="\n":
        line_count +=1

print("Number of lines: ", line_count) #Prints number of lines in file so I know what value to put in lastlines, this command works

firstlines = start.readlines()
firstlines = firstlines[0:19] #Reads first 20 lines
lastlines = end.readlines()
lastlines = lastlines[1192:1212] #Reads last 20 lines

#Aim to sum the first and last 20 values in second and third column

d1first = np.sum(firstlines[:,1])/20
d2first = np.sum(firstlines[:,2])/20
sumfirst = d1first+d2first
d1last = np.sum(lastlines[:,1])/20
d2last = np.sum(lastlines[:,2])/20
sumlast = d1last+d2last
print("Average of first 20 (both detectors) :", sumfirst)
print("Average of last 20 (both detectors) :", sumlast)


The error I get is this:

    d1first = np.sum(firstlines[:,1])

TypeError: list indices must be integers or slices, not tuple
I've been able to use np.sum(data[:,1]) without any problem if I read the whole file, but for some reason this command is not working when I pick specific rows. Should I somehow create a loop to sum until the 20th value (or last 20 values) in the file, or how can I fix this issue? Any help is greatly appreciated!
Reply
#2
It's a feeling but I guess you should first convert your data into float or int (your data are string) prior to use numpy

d1first = np.sum(float(firstlines[:,1]))
Reply
#3
(Jun-16-2021, 09:27 AM)paul18fr Wrote: It's a feeling but I guess you should first convert your data into float or int (your data are string) prior to use numpy

d1first = np.sum(float(firstlines[:,1]))

Hey, thanks for your reply! For some reason with both float() and int() the code returns the same exact error, so perhaps the commands are not properly working for readlines.

The data in a data file looks like this:
         0	         2	         0	    -15000	         0
        25	         2	         1	    -14975	         1
        50	         2	         2	    -14950	         3
        75	         2	         0	    -14925	         3
and so on for +1000 rows.

I now added a few lines in the code to print the "firstlines" and "lastlines", and I get this:

Number of lines:  1212
First 20 lines :
['         0\t         1\t         1\t    -15000\t         2\n', '        25\t         0\t         0\t    -14975\t         4\n', '        50\t         1\t         2\t    -14950\t         3\n', '        75\t         1\t         2\t    -14925\t         2\n', '       100\t         0\t         4\t    -14900\t         3\n', '       125\t         1\t         0\t    -14875\t         4\n', '       150\t         0\t         0\t    -14850\t         2\n', '       175\t         1\t         1\t    -14825\t         1\n', '       200\t         2\t         2\t    -14800\t         0\n', '       225\t         1\t         0\t    -14775\t         1\n', '       250\t         1\t         2\t    -14750\t         3\n', '       275\t         1\t         0\t    -14725\t         0\n', '       300\t         0\t         3\t    -14700\t         5\n', '       325\t         0\t         0\t    -14675\t         2\n', '       350\t         1\t         0\t    -14650\t         4\n', '       375\t         3\t         2\t    -14625\t         2\n', '       400\t         3\t         2\t    -14600\t         5\n', '       425\t         1\t         3\t    -14575\t         4\n', '       450\t         2\t         2\t    -14550\t         2\n', '       475\t         1\t         0\t    -14525\t         2\n']
Last 20 lines :
['     29525\t         0\t         0\t     14525\t         1\n', '     29550\t         0\t         0\t     14550\t         4\n', '     29575\t         0\t         0\t     14575\t         0\n', '     29600\t         0\t         0\t     14600\t         2\n', '     29625\t         0\t         1\t     14625\t         4\n', '     29650\t         0\t         0\t     14650\t         3\n', '     29675\t         0\t         0\t     14675\t         2\n', '     29700\t         0\t         0\t     14700\t         3\n', '     29725\t         0\t         1\t     14725\t         4\n', '     29750\t         1\t         0\t     14750\t         0\n', '     29775\t         0\t         0\t     14775\t         4\n', '     29800\t         0\t         0\t     14800\t         3\n', '     29825\t         0\t         0\t     14825\t         4\n', '     29850\t         0\t         0\t     14850\t         2\n', '     29875\t         0\t         0\t     14875\t         2\n', '     29900\t         0\t         0\t     14900\t         4\n', '     29925\t         1\t         1\t     14925\t         3\n', '     29950\t         0\t         1\t     14950\t         1\n', '     29975\t         0\t         0\t     14975\t         0\n', '     30000\t         0\t         0\t     15000\t         5\n']
The \t and \n are confusing me a little bit, could it be because of that the sum command does not work? For np.loadtxt np.sum works perfectly fine.
Reply
#4
I'm not a regex expert, but the following code might help you

import re

# line = " 0           2           0      -15000           0"
line = "50           2           2      -14950           3"
Values = re.split(r"\s?([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)", line)
print(f"1rst value : {Values[1]}")
print(f"2nd value : {Values[2]}")
print(f"3rd value : {Values[3]}")
print(f"4th value : {Values[4]}")
print(f"5th value : {Values[5]}")
Reply
#5
(Jun-16-2021, 12:13 PM)paul18fr Wrote: I'm not a regex expert, but the following code might help you

import re

# line = " 0           2           0      -15000           0"
line = "50           2           2      -14950           3"
Values = re.split(r"\s?([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)\s+([+\-]?\d+)", line)
print(f"1rst value : {Values[1]}")
print(f"2nd value : {Values[2]}")
print(f"3rd value : {Values[3]}")
print(f"4th value : {Values[4]}")
print(f"5th value : {Values[5]}")

Got the issue resolved by using np.loadtxt and data[-1:1,1] type commands. Thanks for your help!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python Pandas: How do I average ONLY the data >1000 from several columns? JaneTan 0 195 Jul-17-2021, 01:34 PM
Last Post: JaneTan
  [Solved] Plotting data from txt file Laplace12 1 189 Jul-06-2021, 07:14 AM
Last Post: Laplace12
  SaltStack: MySQL returner save less data into Database table columns xtc14 2 242 Jul-02-2021, 02:19 PM
Last Post: xtc14
  Why it does not print(file.read()) Rejaul84 1 237 Jul-01-2021, 10:37 PM
Last Post: bowlofred
  Read and write active Excel file euras 4 361 Jun-29-2021, 11:16 PM
Last Post: Pedroski55
  [Solved] Reading every nth line into a column from txt file Laplace12 7 462 Jun-29-2021, 09:17 AM
Last Post: Laplace12
Lightbulb [Solved] df.loc: write data in certain rows ju21878436312 1 248 Jun-28-2021, 06:49 AM
Last Post: ju21878436312
  [Solved] Trying to read specific lines from a file Laplace12 7 507 Jun-21-2021, 11:15 AM
Last Post: Laplace12
  Read file, reformat and write new file bryanmartin113 1 338 Jun-08-2021, 09:27 PM
Last Post: Larz60+
  Python Matplotlib: Create chart for every 4 columns in Excel file JaneTan 2 701 Feb-28-2021, 05:02 AM
Last Post: JaneTan

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020