Python Forum
Matching multiple conditions and mapping multiple results, between two dataframes
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Matching multiple conditions and mapping multiple results, between two dataframes
#1
I'm very new to python and really don't know where to start doing the following:
I have two dataframes, df1 and df2.

df1
  fruit    id   date
0 apple    2    01/10/20
1 pear     1    15/09/20
2 banana   3    01/06/20
3 peach    4    10/04/20

df2
  name      uid     ndate
0  paul      2    02/11/20
1  tracy     1    15/12/20
2  iain      3    01/05/20
3  frida     4    23/02/20
4  david     2    06/06/20
5  peter     3    19/11/20  
6  adam      4    07/03/20
7  eve       1    30/11/20
8  hannah    2    25/09/20
9  janine    2    13/08/20
10 charlotte 5    10/04/20 
I want to map 'name' values from df2 to new columns in df1.
There are two conditions to be met, first that the 'uid' in df2 must be the same as the 'id' in df1,
and the second that the 'ndate' in df2 must be before (less than) the 'date' in df1.
Ideally each matched value should return to new columns to the right of date in df1.

df1
  fruit    id   date       match1   match2   match3 ...(etc)
0 apple    2    01/10/20   david    hannah   janine
1 pear     1    15/09/20   
2 banana   3    01/06/20   iain
3 peach    4    10/04/20   frida    adam
Hope this makes sense, I've simplified the tables as much as possible, and this is the gist of what I'm looking to do. Any suggestions much appreciated, thanks in advance!
Reply
#2
Hi deveyge33

Good day!. Hope the below answer helps.

import pandas as pd
import datetime as dt

# Load the datasets & convert text into datatime format
df1 = pd.DataFrame(data={"fruit":['apple','pear','banana','peach'],
                          'id':[2,1,3,4],
                          'date':['01/10/20','15/09/20','01/06/20','10/04/20']})

df1['date'] = df1['date'].apply(lambda x: 
                                    dt.datetime.strptime(x,'%d/%m/%y'))


df2 = pd.DataFrame(data={"name":['paul','tracy','iain','frida','david','peter','adam','eve','hannah','janine','charlotte'],
                          'uid':[2,1,3,4,2,3,4,1,2,2,5],
                          'ndate':['02/11/20','15/12/20','01/05/20','23/02/20','06/06/20','19/11/20','07/03/20','30/11/20','25/09/20','13/08/20','10/04/20']})

df2['ndate'] = df2['ndate'].apply(lambda x: 
                                    dt.datetime.strptime(x,'%d/%m/%y'))
then,
# Mearge the two dataframes using "ID" and "UID" columns
df_temp = pd.merge(df1,df2,left_on=['id'], right_on=['uid'],how='left',indicator='True')

# Locate only matches with date condition
df_temp1 = df_temp.loc[ (df_temp['ndate']<=df_temp['date'])].copy()

# Add a sequance for multiple matches
df_temp1['Sequance'] = df_temp1.groupby(['fruit',"uid"]).cumcount()+1

# Delete the "ndate" columns
del df_temp1['ndate']

# Unstack the name column
df_temp2 = df_temp1.set_index(['fruit', 'id', 'date','Sequance'])['name'].unstack().reset_index()

# Merge again to get df1 items that doesn't have matching items in df2
df_result = pd.merge(df1,df_temp2,left_on=['id','date','fruit'], right_on=['id','date','fruit'],how='left')

print(df_result)
del df_temp1, df_temp2
Output:
print(df_result) fruit id date 1 2 3 0 apple 2 2020-10-01 david hannah janine 1 pear 1 2020-09-15 NaN NaN NaN 2 banana 3 2020-06-01 iain NaN NaN 3 peach 4 2020-04-10 frida adam NaN
Reply
#3
Amazing. Thanks so much!
klllmmm likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Statsmodels Multiple Regression Syntax Error Burger 2 398 Jul-13-2021, 03:04 AM
Last Post: Burger
  Import multiple CSV files into pandas Krayna 0 311 May-20-2021, 04:56 PM
Last Post: Krayna
  got an error while calculating a pct_change on a multiple tickers yfinace df Tradigdde 1 553 Nov-28-2020, 08:18 PM
Last Post: Tradigdde
  Automating to generate multiple arrays Robotguy 1 573 Nov-05-2020, 08:14 AM
Last Post: Gribouillis
  Time Series forecating with multiple independent variables Krychol88 1 539 Oct-23-2020, 08:11 AM
Last Post: DPaul
  matching a row from 2 seperate dataframes and minimg a column randor 0 799 Sep-03-2020, 08:57 PM
Last Post: randor
  Loading multiple JSON files to create a csv 0LI5A3A 0 898 Jun-28-2020, 10:35 PM
Last Post: 0LI5A3A
  Filter rows by multiple text conditions in another data frame i.e contains strings an Pan 0 901 Jun-09-2020, 06:05 AM
Last Post: Pan
  Least-squares fit multiple data sets multiverse22 1 1,041 Jun-06-2020, 01:38 AM
Last Post: Larz60+
  extracting sublist from a large multiple molecular file juliocollm 2 899 May-25-2020, 12:49 PM
Last Post: juliocollm

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020