Bottom Page

Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 How to filter specific rows from large data file
#1
Hi I have a large data file and I'm only interested in rows with specific x values of 4.125 as shown below. Now because the value of 4.125 relates to the stop position of the ion, the corresponding start position is also of interest to me and I want to keep this information in the array. How do I write a program which effectively finds the x stop position of 4.125 and retains the ion start position. It is a 120982 * 9 array and in the example shown below I would be interested in keeping the information of ion # 3849096.

"Ion N","Mass","Charge","X","Y","Z","Azm","Elv","KE" 3849094,0.00054858,-1,66.5216,-51,-3.8,-180,88.7,18160 3849094,0.00054858,-1,27.3925,30.3532,-4.07076,-177.1,41.5494,17697.2 3849095,0.00054858,-1,66.5216,-51,-3.7,-180,88.7,18160 3849095,0.00054858,-1,26.6277,31.0039,-3.91402,-177.096,40.8293,17699.4 3849096,0.00054858,-1,66.5216,-51,-3.6,-180,88.7,18160 3849096,0.00054858,-1,4.125,44.9887,-2.47517,-176.363,25.715,17711.1

This is the code I have developed so far but does not work:

import pandas as pd 
import numpy as np

opts = pd.read_csv('Ambre_2.dat',sep = ',', low_memory = False)
df = pd.DataFrame(opts)

X = df.iloc[:,3]
IonN = df.iloc[:,0]
tol = 1e-6
Fltr = 4.125

filterreddata = df[abs(df.X-Fltr)<tol,:]
filteredions = df(np.in1d(df.IonN, filterreddata.IonN), :]
filteredions[2:2:end, :] = []
f = open('ions.csv', 'w')
f.write(tabulate(filteredions))
f.close()
Quote
#2

  1. What do you mean "does not work"?
  2. Line 4 gives DataFrame - why do you need line 5?
  3. You can save DataFrame with to_csv - why the hell you use tabulate?
  4. Why do you use iloc when df['X'] and df['Ion N'] would have worked?
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Quote
#3
What do you mean "does not work"?

I get the following error:

TypeError: 'Series' objects are mutable, thus they cannot be hashed
Quote
#4
That will find you the row

df[abs(df['X'] - 4.125) <= 1e-6]
Result
Output:
Ion N Mass Charge X Y Z Azm Elv KE 5 3849096 0.000549 -1 4.125 44.9887 -2.47517 -176.363 25.715 17711.1

(Jun-29-2018, 01:57 PM)Ariane Wrote: What do you mean "does not work"?

I get the following error:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

In which bloody line do you get that message? Asking Python question 101 - if you have exception, show full traceback - or at least the part that contains reference to your code. In output tags, of course.
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Quote
#5
Thank you for your prompt reply. The error was attributed to line 13.
Quote
#6
I think this is what you want...

filterreddata = df[abs(df['X'] - Fltr) < tol]
filteredions = df[np.in1d(df['Ion N'], filterreddata['Ion N'])]
Quote
#7
I'm I correct in using np.in1d as a substitute for ismember (Matlab command)? I have been trying to find an equivalent function for ismember in python.
Quote
#8
I don't know much about Matlab, but the numpy.in1D() do this:

Quote:Test whether each element of a 1-D array is also present in a second array.
numpy.in1D() reference
volcano63 likes this post
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Filter rows by multiple text conditions in another data frame i.e contains strings an Pan 0 262 Jun-09-2020, 06:05 AM
Last Post: Pan
  extracting sublist from a large multiple molecular file juliocollm 2 289 May-25-2020, 12:49 PM
Last Post: juliocollm
  How can I convert time-series data in rows into column srvmig 0 187 Apr-11-2020, 05:40 AM
Last Post: srvmig
  Sorting a large CVS file DavidTheGrockle 1 295 Oct-31-2019, 12:32 PM
Last Post: ichabod801
  Creating new rows and adding them to empty data frame kapilan15 0 506 May-31-2019, 10:19 AM
Last Post: kapilan15
  How to filter data using a panda.DateFrame.loc pawlo392 1 928 May-27-2019, 08:47 PM
Last Post: michalmonday
  Drop rows from data with zero value Devilish 3 972 Dec-27-2018, 02:06 AM
Last Post: Devilish
  Frequency Filter for wav file plotting in python manukanu 0 2,299 Dec-11-2018, 09:58 AM
Last Post: manukanu
  Write specific rows from pandas dataframe to csv file pradeepkumarbe 3 1,433 Oct-18-2018, 09:33 PM
Last Post: volcano63
  Removing rows at random based on the value of a specific column Mr_Keystrokes 4 918 Aug-24-2018, 11:15 AM
Last Post: Mr_Keystrokes

Forum Jump:


Users browsing this thread: 1 Guest(s)