Python Forum
Compare between 2 DataFrames
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Compare between 2 DataFrames
#1
Hi All,

I am a newbie in python and got stuck. Actually i want to select the data after comparing 2 data frames i.e. want to select all the records for matched Consumer_No from df2 for all the records whose date is lesser or equal to df2['Date']

df1 = {Consumer_No :[1011,1012,1013,1014,1011,1012,1013,1014,1011,1012,1013,1014,1011,1012,1013,1014],
       Date:['31/01/2019','31/01/2019','31/01/2019','31/01/2019','28/02/2019','28/02/2019','28/02/2019','28/02/2019','31/03/2019','31/03/2019','31/03/2019','31/03/2019','30/04/2019','30/04/2019','30/04/2019','30/04/2019']
      }

df2 = {Consumer_No : [1013,1011],Date:['24/03/2019','29/04/2019']}

if df1['Consumer_No']== df2['Consumer_No']:
     if df1['Date'] >= df2['Date']:
        matched = pd.merged(df1,df2, on='Consumer_No')
Please help me out in the same.
Reply
#2
Something wrong with this code. df1 and df2 are not Pandas data frames,
they are Python dictionaries.
I don't understand a second condition, are you requiring that df1.Date date is greater (later)
than at least one date in df2.Date? These dataframes have non-equal sizes, so we cannot perform
elementwise comparison.

Nevertheless, try the following, this is probably is what you are looking for:

import pandas as pd
df1 = pd.DataFrame({'Consumer_No' :[1011,1012,1013,1014,1011,1012,1013,1014,1011,1012,1013,1014,1011,1012,1013,1014],
       'Date':['31/01/2019','31/01/2019','31/01/2019','31/01/2019','28/02/2019','28/02/2019','28/02/2019','28/02/2019','31/03/2019','31/03/2019','31/03/2019','31/03/2019','30/04/2019','30/04/2019','30/04/2019','30/04/2019']
      })
 
df2 = pd.DataFrame({'Consumer_No' : [1013,1011],'Date':['24/03/2019','29/04/2019']})

df1.Date =pd.to_datetime(df1.Date)
df2.Date =pd.to_datetime(df2.Date)

first_condition = df1.Consumer_No.isin(df2.Consumer_No)
second_condition = df1.Date > min(df2.Date)
result = df1.loc[first_condition & second_condition]
Reply
#3
Thank you very much scidam for your reply. Actually i am reading 2 csv files which results 2 different dataframes df1 & df2. Well, df1 consists monthly data while df2 consists the information about the check performed on particular user.So, I used second condition because i want to extract all the prior records of a matched consumers from the date mentioned in df2.

(Jul-25-2019, 11:51 PM)scidam Wrote: Something wrong with this code. df1 and df2 are not Pandas data frames,
they are Python dictionaries.
I don't understand a second condition, are you requiring that df1.Date date is greater (later)
than at least one date in df2.Date? These dataframes have non-equal sizes, so we cannot perform
elementwise comparison.

Nevertheless, try the following, this is probably is what you are looking for:

import pandas as pd
df1 = pd.DataFrame({'Consumer_No' :[1011,1012,1013,1014,1011,1012,1013,1014,1011,1012,1013,1014,1011,1012,1013,1014],
       'Date':['31/01/2019','31/01/2019','31/01/2019','31/01/2019','28/02/2019','28/02/2019','28/02/2019','28/02/2019','31/03/2019','31/03/2019','31/03/2019','31/03/2019','30/04/2019','30/04/2019','30/04/2019','30/04/2019']
      })
 
df2 = pd.DataFrame({'Consumer_No' : [1013,1011],'Date':['24/03/2019','29/04/2019']})

df1.Date =pd.to_datetime(df1.Date)
df2.Date =pd.to_datetime(df2.Date)

first_condition = df1.Consumer_No.isin(df2.Consumer_No)
second_condition = df1.Date > min(df2.Date)
result = df1.loc[first_condition & second_condition]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Merging two DataFrames based on indexes from two other DataFrames lucinda_rigeitti 0 1,726 Jan-16-2020, 08:36 PM
Last Post: lucinda_rigeitti

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020