Select rows based on a criteria

klllmmm · (This post was last modified: May-11-2017, 01:51 PM by klllmmm.)

Thank you very much for the solution & the points noted.

1. "and t1.Ref != 'Transfer'" was a mistake. It has to be (t2.Ref = 'ADD' or t2.Ref = 'Transfer') and (t1.Ref = 'Deduct' or t1.Ref = 'Transfer')
2. I admit the fact that the conditions were actually vague in my initial quiz.
3. Follow up or additional quiz
In my main dataset(not this 10 test items above), i have to take out some records by keeping only
lowest datetime difference values between datetime_X & datetime_Y may be by ranking the date differences for each records matched in left table.
below resulting df illustrates the scenario.

import pandas as pd
import datetime as dt
 
table = pd.DataFrame(data = {'ID':[1,2,3,4,5,6,7,8,9,10,11,12],
                              'DateTime':['01-01-17 16:30','01-01-17 16:31','02-01-17 08:45','03-01-17 08:45','02-01-17 08:45','02-01-17 10:40','02-01-17 16:40','02-01-17 16:41','02-01-17 16:42','03-01-17 08:45','03-01-17 08:45','03-01-17 10:48'],
                             'Amount':[1000,2000,1000,500,1000,50000,4000,5000,9000,4000,5000,20000],
                             'Ref':['Deduct','Deduct','Add','Add','Add','Add','Transfer','Transfer','Deduct','Add','Add','Deduct'],
                             'DrCode':[1500,1400,9000,9000,9000,9000,1600,1700,2000,9000,9000,4000],
                             'CrCode':[9000,9000,1500,1500,1400,3000,2000,2000,9000,1600,1700,9000],})

table['DateTime'] = table['DateTime'].apply(lambda x: 
                                    dt.datetime.strptime(x,'%d-%m-%y %H:%M'))

df = pd.DataFrame(table, columns=['ID','DateTime', 'Amount', 'Ref', 'DrCode', 'CrCode'])

df = pd.merge(df, df, left_on='DrCode', right_on='CrCode', how='inner')
df = df.query("Amount_y <= Amount_x and (Ref_x == 'Deduct' or Ref_x=='Transfer') and (Ref_y=='Add' or Ref_y=='Transfer')")

df['DateDiff'] = df['DateTime_y']-df['DateTime_x']

I want to keep only the records with lowest absolute value of date difference for each 'ID_x' records where the 'DrCode_y' is not equal to 9000.

I'm expecting to do some thing like below SQL (using MS-SQL 2014)

SELECT *
FROM
      (select *
                 ,row_number() over (partition by(ID_x) order by ABS(datediff))  'DateIDRank'
       from df ) as t1
WHERE t1.DateIDRank = 1 or
             t1.DrCode_y != 9000

A help on how to do this with pandas is highly appreciated.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Merging rows and adding columns based on matching index	pythonnewbie78	3	830	Dec-24-2023, 11:51 AM Last Post: Pedroski55
	Pandas Dataframe Filtering based on rows	mvdlm	0	1,449	Apr-02-2022, 06:39 PM Last Post: mvdlm
	Extracting rows based on condition on one column	Robotguy	2	2,228	Aug-07-2020, 02:27 AM Last Post: Robotguy
	Dropping Rows From A Data Frame Based On A Variable	JoeDainton123	1	2,236	Aug-03-2020, 02:05 AM Last Post: scidam
	Counting Criteria in Pandas Question	Koenig	1	2,178	Sep-30-2019, 05:16 AM Last Post: perfringo
	select data based on indice	Staph	4	2,552	Jul-15-2019, 02:05 AM Last Post: scidam
	Removing rows at random based on the value of a specific column	Mr_Keystrokes	4	5,647	Aug-24-2018, 11:15 AM Last Post: Mr_Keystrokes
	Updating df rows based on 2 conditions	stretch	1	3,155	May-02-2018, 09:15 AM Last Post: volcano63
	drop duplicate values based on a criteria	klllmmm	1	5,621	Apr-28-2017, 10:17 AM Last Post: zivoni
	Insert values into a column in the same table based on a criteria	klllmmm	3	4,240	Apr-13-2017, 10:10 AM Last Post: zivoni

Select rows based on a criteria

User Panel Messages

Announcements