Python Forum
Matching multiple conditions and mapping multiple results, between two dataframes - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Matching multiple conditions and mapping multiple results, between two dataframes (/thread-31758.html)



Matching multiple conditions and mapping multiple results, between two dataframes - daveyg33 - Jan-01-2021

I'm very new to python and really don't know where to start doing the following:
I have two dataframes, df1 and df2.

df1
  fruit    id   date
0 apple    2    01/10/20
1 pear     1    15/09/20
2 banana   3    01/06/20
3 peach    4    10/04/20

df2
  name      uid     ndate
0  paul      2    02/11/20
1  tracy     1    15/12/20
2  iain      3    01/05/20
3  frida     4    23/02/20
4  david     2    06/06/20
5  peter     3    19/11/20  
6  adam      4    07/03/20
7  eve       1    30/11/20
8  hannah    2    25/09/20
9  janine    2    13/08/20
10 charlotte 5    10/04/20 
I want to map 'name' values from df2 to new columns in df1.
There are two conditions to be met, first that the 'uid' in df2 must be the same as the 'id' in df1,
and the second that the 'ndate' in df2 must be before (less than) the 'date' in df1.
Ideally each matched value should return to new columns to the right of date in df1.

df1
  fruit    id   date       match1   match2   match3 ...(etc)
0 apple    2    01/10/20   david    hannah   janine
1 pear     1    15/09/20   
2 banana   3    01/06/20   iain
3 peach    4    10/04/20   frida    adam
Hope this makes sense, I've simplified the tables as much as possible, and this is the gist of what I'm looking to do. Any suggestions much appreciated, thanks in advance!


RE: Matching multiple conditions and mapping multiple results, between two dataframes - klllmmm - Jan-02-2021

Hi deveyge33

Good day!. Hope the below answer helps.

import pandas as pd
import datetime as dt

# Load the datasets & convert text into datatime format
df1 = pd.DataFrame(data={"fruit":['apple','pear','banana','peach'],
                          'id':[2,1,3,4],
                          'date':['01/10/20','15/09/20','01/06/20','10/04/20']})

df1['date'] = df1['date'].apply(lambda x: 
                                    dt.datetime.strptime(x,'%d/%m/%y'))


df2 = pd.DataFrame(data={"name":['paul','tracy','iain','frida','david','peter','adam','eve','hannah','janine','charlotte'],
                          'uid':[2,1,3,4,2,3,4,1,2,2,5],
                          'ndate':['02/11/20','15/12/20','01/05/20','23/02/20','06/06/20','19/11/20','07/03/20','30/11/20','25/09/20','13/08/20','10/04/20']})

df2['ndate'] = df2['ndate'].apply(lambda x: 
                                    dt.datetime.strptime(x,'%d/%m/%y'))
then,
# Mearge the two dataframes using "ID" and "UID" columns
df_temp = pd.merge(df1,df2,left_on=['id'], right_on=['uid'],how='left',indicator='True')

# Locate only matches with date condition
df_temp1 = df_temp.loc[ (df_temp['ndate']<=df_temp['date'])].copy()

# Add a sequance for multiple matches
df_temp1['Sequance'] = df_temp1.groupby(['fruit',"uid"]).cumcount()+1

# Delete the "ndate" columns
del df_temp1['ndate']

# Unstack the name column
df_temp2 = df_temp1.set_index(['fruit', 'id', 'date','Sequance'])['name'].unstack().reset_index()

# Merge again to get df1 items that doesn't have matching items in df2
df_result = pd.merge(df1,df_temp2,left_on=['id','date','fruit'], right_on=['id','date','fruit'],how='left')

print(df_result)
del df_temp1, df_temp2
Output:
print(df_result) fruit id date 1 2 3 0 apple 2 2020-10-01 david hannah janine 1 pear 1 2020-09-15 NaN NaN NaN 2 banana 3 2020-06-01 iain NaN NaN 3 peach 4 2020-04-10 frida adam NaN



RE: Matching multiple conditions and mapping multiple results, between two dataframes - daveyg33 - Jan-02-2021

Amazing. Thanks so much!