Python Forum
How to filter specific rows from large data file
Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to filter specific rows from large data file
#1
Hi I have a large data file and I'm only interested in rows with specific x values of 4.125 as shown below. Now because the value of 4.125 relates to the stop position of the ion, the corresponding start position is also of interest to me and I want to keep this information in the array. How do I write a program which effectively finds the x stop position of 4.125 and retains the ion start position. It is a 120982 * 9 array and in the example shown below I would be interested in keeping the information of ion # 3849096.

"Ion N","Mass","Charge","X","Y","Z","Azm","Elv","KE" 3849094,0.00054858,-1,66.5216,-51,-3.8,-180,88.7,18160 3849094,0.00054858,-1,27.3925,30.3532,-4.07076,-177.1,41.5494,17697.2 3849095,0.00054858,-1,66.5216,-51,-3.7,-180,88.7,18160 3849095,0.00054858,-1,26.6277,31.0039,-3.91402,-177.096,40.8293,17699.4 3849096,0.00054858,-1,66.5216,-51,-3.6,-180,88.7,18160 3849096,0.00054858,-1,4.125,44.9887,-2.47517,-176.363,25.715,17711.1

This is the code I have developed so far but does not work:

import pandas as pd 
import numpy as np

opts = pd.read_csv('Ambre_2.dat',sep = ',', low_memory = False)
df = pd.DataFrame(opts)

X = df.iloc[:,3]
IonN = df.iloc[:,0]
tol = 1e-6
Fltr = 4.125

filterreddata = df[abs(df.X-Fltr)<tol,:]
filteredions = df(np.in1d(df.IonN, filterreddata.IonN), :]
filteredions[2:2:end, :] = []
f = open('ions.csv', 'w')
f.write(tabulate(filteredions))
f.close()
Reply
#2

  1. What do you mean "does not work"?
  2. Line 4 gives DataFrame - why do you need line 5?
  3. You can save DataFrame with to_csv - why the hell you use tabulate?
  4. Why do you use iloc when df['X'] and df['Ion N'] would have worked?
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply
#3
What do you mean "does not work"?

I get the following error:

TypeError: 'Series' objects are mutable, thus they cannot be hashed
Reply
#4
That will find you the row

df[abs(df['X'] - 4.125) <= 1e-6]
Result
Output:
Ion N Mass Charge X Y Z Azm Elv KE 5 3849096 0.000549 -1 4.125 44.9887 -2.47517 -176.363 25.715 17711.1

(Jun-29-2018, 01:57 PM)Ariane Wrote: What do you mean "does not work"?

I get the following error:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

In which bloody line do you get that message? Asking Python question 101 - if you have exception, show full traceback - or at least the part that contains reference to your code. In output tags, of course.
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply
#5
Thank you for your prompt reply. The error was attributed to line 13.
Reply
#6
I think this is what you want...

filterreddata = df[abs(df['X'] - Fltr) < tol]
filteredions = df[np.in1d(df['Ion N'], filterreddata['Ion N'])]
Reply
#7
I'm I correct in using np.in1d as a substitute for ismember (Matlab command)? I have been trying to find an equivalent function for ismember in python.
Reply
#8
I don't know much about Matlab, but the numpy.in1D() do this:

Quote:Test whether each element of a 1-D array is also present in a second array.
numpy.in1D() reference
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Errors if an xlsx file has blank rows in the beginning…. tester_V 1 790 Aug-28-2023, 06:22 PM
Last Post: deanhystad
  Sorting data by specific variables using argparse Bearinabox 5 1,369 Jan-01-2023, 07:44 PM
Last Post: Bearinabox
  Counting Duplicates in large Data Set jmair 3 1,092 Dec-07-2022, 09:42 AM
Last Post: paul18fr
  Reading large crapy text file in anaconda to profile data syamatunuguntla 0 811 Nov-18-2022, 06:15 PM
Last Post: syamatunuguntla
  Training a model to identify specific SMS types and extract relevant data? lord_of_cinder 0 955 Oct-10-2022, 04:35 AM
Last Post: lord_of_cinder
  Split excel file and write output at specific row and set sheet position DSCA 0 1,960 May-12-2022, 07:29 PM
Last Post: DSCA
  Searching Module to plot large data G_rizzle 0 1,422 Dec-06-2021, 08:00 AM
Last Post: G_rizzle
  how to filter data frame dynamically with the columns psahay 0 2,379 Aug-24-2020, 01:10 PM
Last Post: psahay
  Dropping Rows From A Data Frame Based On A Variable JoeDainton123 1 2,188 Aug-03-2020, 02:05 AM
Last Post: scidam
  Chunking and Sorting a large file Robotguy 1 3,544 Jul-29-2020, 12:48 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020