Python Forum

Full Version: How to filter specific rows from large data file
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi I have a large data file and I'm only interested in rows with specific x values of 4.125 as shown below. Now because the value of 4.125 relates to the stop position of the ion, the corresponding start position is also of interest to me and I want to keep this information in the array. How do I write a program which effectively finds the x stop position of 4.125 and retains the ion start position. It is a 120982 * 9 array and in the example shown below I would be interested in keeping the information of ion # 3849096.

"Ion N","Mass","Charge","X","Y","Z","Azm","Elv","KE" 3849094,0.00054858,-1,66.5216,-51,-3.8,-180,88.7,18160 3849094,0.00054858,-1,27.3925,30.3532,-4.07076,-177.1,41.5494,17697.2 3849095,0.00054858,-1,66.5216,-51,-3.7,-180,88.7,18160 3849095,0.00054858,-1,26.6277,31.0039,-3.91402,-177.096,40.8293,17699.4 3849096,0.00054858,-1,66.5216,-51,-3.6,-180,88.7,18160 3849096,0.00054858,-1,4.125,44.9887,-2.47517,-176.363,25.715,17711.1

This is the code I have developed so far but does not work:

import pandas as pd 
import numpy as np

opts = pd.read_csv('Ambre_2.dat',sep = ',', low_memory = False)
df = pd.DataFrame(opts)

X = df.iloc[:,3]
IonN = df.iloc[:,0]
tol = 1e-6
Fltr = 4.125

filterreddata = df[abs(df.X-Fltr)<tol,:]
filteredions = df(np.in1d(df.IonN, filterreddata.IonN), :]
filteredions[2:2:end, :] = []
f = open('ions.csv', 'w')
f.write(tabulate(filteredions))
f.close()

  1. What do you mean "does not work"?
  2. Line 4 gives DataFrame - why do you need line 5?
  3. You can save DataFrame with to_csv - why the hell you use tabulate?
  4. Why do you use iloc when df['X'] and df['Ion N'] would have worked?
What do you mean "does not work"?

I get the following error:

TypeError: 'Series' objects are mutable, thus they cannot be hashed
That will find you the row

df[abs(df['X'] - 4.125) <= 1e-6]
Result
Output:
Ion N Mass Charge X Y Z Azm Elv KE 5 3849096 0.000549 -1 4.125 44.9887 -2.47517 -176.363 25.715 17711.1

(Jun-29-2018, 01:57 PM)Ariane Wrote: [ -> ]What do you mean "does not work"?

I get the following error:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

In which bloody line do you get that message? Asking Python question 101 - if you have exception, show full traceback - or at least the part that contains reference to your code. In output tags, of course.
Thank you for your prompt reply. The error was attributed to line 13.
I think this is what you want...

filterreddata = df[abs(df['X'] - Fltr) < tol]
filteredions = df[np.in1d(df['Ion N'], filterreddata['Ion N'])]
I'm I correct in using np.in1d as a substitute for ismember (Matlab command)? I have been trying to find an equivalent function for ismember in python.
I don't know much about Matlab, but the numpy.in1D() do this:

Quote:Test whether each element of a 1-D array is also present in a second array.
numpy.in1D() reference