Python Forum
Matching multiple conditions and mapping multiple results, between two dataframes
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Matching multiple conditions and mapping multiple results, between two dataframes
#1
I'm very new to python and really don't know where to start doing the following:
I have two dataframes, df1 and df2.

df1
  fruit    id   date
0 apple    2    01/10/20
1 pear     1    15/09/20
2 banana   3    01/06/20
3 peach    4    10/04/20

df2
  name      uid     ndate
0  paul      2    02/11/20
1  tracy     1    15/12/20
2  iain      3    01/05/20
3  frida     4    23/02/20
4  david     2    06/06/20
5  peter     3    19/11/20  
6  adam      4    07/03/20
7  eve       1    30/11/20
8  hannah    2    25/09/20
9  janine    2    13/08/20
10 charlotte 5    10/04/20 
I want to map 'name' values from df2 to new columns in df1.
There are two conditions to be met, first that the 'uid' in df2 must be the same as the 'id' in df1,
and the second that the 'ndate' in df2 must be before (less than) the 'date' in df1.
Ideally each matched value should return to new columns to the right of date in df1.

df1
  fruit    id   date       match1   match2   match3 ...(etc)
0 apple    2    01/10/20   david    hannah   janine
1 pear     1    15/09/20   
2 banana   3    01/06/20   iain
3 peach    4    10/04/20   frida    adam
Hope this makes sense, I've simplified the tables as much as possible, and this is the gist of what I'm looking to do. Any suggestions much appreciated, thanks in advance!
Reply
#2
Hi deveyge33

Good day!. Hope the below answer helps.

import pandas as pd
import datetime as dt

# Load the datasets & convert text into datatime format
df1 = pd.DataFrame(data={"fruit":['apple','pear','banana','peach'],
                          'id':[2,1,3,4],
                          'date':['01/10/20','15/09/20','01/06/20','10/04/20']})

df1['date'] = df1['date'].apply(lambda x: 
                                    dt.datetime.strptime(x,'%d/%m/%y'))


df2 = pd.DataFrame(data={"name":['paul','tracy','iain','frida','david','peter','adam','eve','hannah','janine','charlotte'],
                          'uid':[2,1,3,4,2,3,4,1,2,2,5],
                          'ndate':['02/11/20','15/12/20','01/05/20','23/02/20','06/06/20','19/11/20','07/03/20','30/11/20','25/09/20','13/08/20','10/04/20']})

df2['ndate'] = df2['ndate'].apply(lambda x: 
                                    dt.datetime.strptime(x,'%d/%m/%y'))
then,
# Mearge the two dataframes using "ID" and "UID" columns
df_temp = pd.merge(df1,df2,left_on=['id'], right_on=['uid'],how='left',indicator='True')

# Locate only matches with date condition
df_temp1 = df_temp.loc[ (df_temp['ndate']<=df_temp['date'])].copy()

# Add a sequance for multiple matches
df_temp1['Sequance'] = df_temp1.groupby(['fruit',"uid"]).cumcount()+1

# Delete the "ndate" columns
del df_temp1['ndate']

# Unstack the name column
df_temp2 = df_temp1.set_index(['fruit', 'id', 'date','Sequance'])['name'].unstack().reset_index()

# Merge again to get df1 items that doesn't have matching items in df2
df_result = pd.merge(df1,df_temp2,left_on=['id','date','fruit'], right_on=['id','date','fruit'],how='left')

print(df_result)
del df_temp1, df_temp2
Output:
print(df_result) fruit id date 1 2 3 0 apple 2 2020-10-01 david hannah janine 1 pear 1 2020-09-15 NaN NaN NaN 2 banana 3 2020-06-01 iain NaN NaN 3 peach 4 2020-04-10 frida adam NaN
Reply
#3
Amazing. Thanks so much!
klllmmm likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Add multiple vertical rectangles to a chart in plotly? devansing 2 1,218 Jun-20-2023, 07:55 AM
Last Post: snippsat
  1st layer tf.keras output shape set at multiple - need help! Afrodizzyjack 0 1,826 Jun-07-2022, 04:53 PM
Last Post: Afrodizzyjack
  Creating a Dataframe from Zenodo zip file with multiple CSVs about Spotify man0s 0 1,358 Apr-26-2022, 01:45 PM
Last Post: man0s
  Statsmodels Multiple Regression Syntax Error Burger 2 2,821 Jul-13-2021, 03:04 AM
Last Post: Burger
  Import multiple CSV files into pandas Krayna 0 1,723 May-20-2021, 04:56 PM
Last Post: Krayna
  got an error while calculating a pct_change on a multiple tickers yfinace df Tradigdde 1 1,493 Nov-28-2020, 08:18 PM
Last Post: Tradigdde
  Automating to generate multiple arrays Robotguy 1 1,807 Nov-05-2020, 08:14 AM
Last Post: Gribouillis
  Time Series forecating with multiple independent variables Krychol88 1 1,874 Oct-23-2020, 08:11 AM
Last Post: DPaul
  matching a row from 2 seperate dataframes and minimg a column randor 0 1,544 Sep-03-2020, 08:57 PM
Last Post: randor
  Loading multiple JSON files to create a csv 0LI5A3A 0 2,108 Jun-28-2020, 10:35 PM
Last Post: 0LI5A3A

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020