Python Forum
Extracting rows based on condition on one column
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extracting rows based on condition on one column
#1
Hi Everyone,

Here is an interesting problem I am trying to solve.

I have a (Nx4) array and want to extract those rows which have their third column's element in certain range. Are there existing capabilities in NumPy? Below is a simple example.

PS: I know how for loops can be used by comparing each element of col. 3; and saving the rows that meet the condition. But I want to use NumPy here (like slicing etc., that is promisingly fast). In reality, the arrays I use are large and implementing additional loops will sacrifice comp. times.

For example,:
input = [[1,2,-97,4],
         [5,6,93,8],
         [9,10,-105,12],
         [11,12,105,14]]

output = [[1,2,-97,4], # desired output: rows in which column third's element is greater than -100 and less than 100
          [5,6,93,8]]
Reply
#2
import numpy as np
input = [[1,2,-97,4],
         [5,6,93,8],
         [9,10,-105,12],
         [11,12,105,14]]
input = np.array(input)
input[(-100 < input[:, 2]) & (input[:, 2] < 100)]
Reply
#3
Thanks, it worked!

Out of curiosity, here is a little test I did comparing the execution time. It appears the NumPy method is 75x faster than looping. Do, you know what makes NumPy fast? Does it store the array in some efficient manner or something else?

import time
input = np.arange(4*10**7).reshape((10**7, 4))

# First method: Using NumPy
start_time = time.time()
print(input[(-1000 < input[:, 2]) & (input[:, 2] < 10000)])
print(time.time()-start_time)
start_time = time.time()

# Second method: Without NumPy

diff = []
for row in range(10**7):
    if -1000 < arr[row, 2] < 10000:
          diff.append(arr[row, :])

print(diff)

print(time.time()-start_time)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Merging rows and adding columns based on matching index pythonnewbie78 3 749 Dec-24-2023, 11:51 AM
Last Post: Pedroski55
  Make unique id in vectorized way based on text data column with similarity scoring ill8 0 861 Dec-12-2022, 03:22 AM
Last Post: ill8
  reduce time series based on sum condition amdi40 0 1,078 Apr-06-2022, 09:09 AM
Last Post: amdi40
  Pandas Dataframe Filtering based on rows mvdlm 0 1,396 Apr-02-2022, 06:39 PM
Last Post: mvdlm
  New Dataframe Column Based on Several Conditions nb1214 1 1,783 Nov-16-2021, 10:52 PM
Last Post: jefsummers
  Pandas Data frame column condition check based on length of the value aditi06 1 2,655 Jul-28-2021, 11:08 AM
Last Post: jefsummers
Question [Solved] How to refer to dataframe column name based on a list lorensa74 1 2,238 May-17-2021, 07:02 AM
Last Post: lorensa74
  Add column based on others timste 8 3,951 Apr-03-2021, 07:39 AM
Last Post: devesh_sahu
  Dropping Rows From A Data Frame Based On A Variable JoeDainton123 1 2,186 Aug-03-2020, 02:05 AM
Last Post: scidam
  How to shift data frame rows of specified column Mekala 0 1,860 Jul-21-2020, 02:42 PM
Last Post: Mekala

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020