Thank you very much for the solution & the points noted.
1. "and t1.Ref != 'Transfer'" was a mistake. It has to be (t2.Ref = 'ADD' or t2.Ref = 'Transfer') and (t1.Ref = 'Deduct' or t1.Ref = 'Transfer')
2. I admit the fact that the conditions were actually vague in my initial quiz.
3. Follow up or additional quiz
In my main dataset(not this 10 test items above), i have to take out some records by keeping only
lowest datetime difference values between datetime_X & datetime_Y may be by ranking the date differences for each records matched in left table.
below resulting df illustrates the scenario.
I'm expecting to do some thing like below SQL (using MS-SQL 2014)
1. "and t1.Ref != 'Transfer'" was a mistake. It has to be (t2.Ref = 'ADD' or t2.Ref = 'Transfer') and (t1.Ref = 'Deduct' or t1.Ref = 'Transfer')
2. I admit the fact that the conditions were actually vague in my initial quiz.
3. Follow up or additional quiz
In my main dataset(not this 10 test items above), i have to take out some records by keeping only
lowest datetime difference values between datetime_X & datetime_Y may be by ranking the date differences for each records matched in left table.
below resulting df illustrates the scenario.
import pandas as pd import datetime as dt table = pd.DataFrame(data = {'ID':[1,2,3,4,5,6,7,8,9,10,11,12], 'DateTime':['01-01-17 16:30','01-01-17 16:31','02-01-17 08:45','03-01-17 08:45','02-01-17 08:45','02-01-17 10:40','02-01-17 16:40','02-01-17 16:41','02-01-17 16:42','03-01-17 08:45','03-01-17 08:45','03-01-17 10:48'], 'Amount':[1000,2000,1000,500,1000,50000,4000,5000,9000,4000,5000,20000], 'Ref':['Deduct','Deduct','Add','Add','Add','Add','Transfer','Transfer','Deduct','Add','Add','Deduct'], 'DrCode':[1500,1400,9000,9000,9000,9000,1600,1700,2000,9000,9000,4000], 'CrCode':[9000,9000,1500,1500,1400,3000,2000,2000,9000,1600,1700,9000],}) table['DateTime'] = table['DateTime'].apply(lambda x: dt.datetime.strptime(x,'%d-%m-%y %H:%M')) df = pd.DataFrame(table, columns=['ID','DateTime', 'Amount', 'Ref', 'DrCode', 'CrCode']) df = pd.merge(df, df, left_on='DrCode', right_on='CrCode', how='inner') df = df.query("Amount_y <= Amount_x and (Ref_x == 'Deduct' or Ref_x=='Transfer') and (Ref_y=='Add' or Ref_y=='Transfer')") df['DateDiff'] = df['DateTime_y']-df['DateTime_x']I want to keep only the records with lowest absolute value of date difference for each 'ID_x' records where the 'DrCode_y' is not equal to 9000.
I'm expecting to do some thing like below SQL (using MS-SQL 2014)
SELECT * FROM (select * ,row_number() over (partition by(ID_x) order by ABS(datediff)) 'DateIDRank' from df ) as t1 WHERE t1.DateIDRank = 1 or t1.DrCode_y != 9000A help on how to do this with pandas is highly appreciated.