Python Forum
Compare between 2 DataFrames - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Compare between 2 DataFrames (/thread-20054.html)



Compare between 2 DataFrames - Nidhesh - Jul-25-2019

Hi All,

I am a newbie in python and got stuck. Actually i want to select the data after comparing 2 data frames i.e. want to select all the records for matched Consumer_No from df2 for all the records whose date is lesser or equal to df2['Date']

df1 = {Consumer_No :[1011,1012,1013,1014,1011,1012,1013,1014,1011,1012,1013,1014,1011,1012,1013,1014],
       Date:['31/01/2019','31/01/2019','31/01/2019','31/01/2019','28/02/2019','28/02/2019','28/02/2019','28/02/2019','31/03/2019','31/03/2019','31/03/2019','31/03/2019','30/04/2019','30/04/2019','30/04/2019','30/04/2019']
      }

df2 = {Consumer_No : [1013,1011],Date:['24/03/2019','29/04/2019']}

if df1['Consumer_No']== df2['Consumer_No']:
     if df1['Date'] >= df2['Date']:
        matched = pd.merged(df1,df2, on='Consumer_No')
Please help me out in the same.


RE: Compare between 2 DataFrames - scidam - Jul-25-2019

Something wrong with this code. df1 and df2 are not Pandas data frames,
they are Python dictionaries.
I don't understand a second condition, are you requiring that df1.Date date is greater (later)
than at least one date in df2.Date? These dataframes have non-equal sizes, so we cannot perform
elementwise comparison.

Nevertheless, try the following, this is probably is what you are looking for:

import pandas as pd
df1 = pd.DataFrame({'Consumer_No' :[1011,1012,1013,1014,1011,1012,1013,1014,1011,1012,1013,1014,1011,1012,1013,1014],
       'Date':['31/01/2019','31/01/2019','31/01/2019','31/01/2019','28/02/2019','28/02/2019','28/02/2019','28/02/2019','31/03/2019','31/03/2019','31/03/2019','31/03/2019','30/04/2019','30/04/2019','30/04/2019','30/04/2019']
      })
 
df2 = pd.DataFrame({'Consumer_No' : [1013,1011],'Date':['24/03/2019','29/04/2019']})

df1.Date =pd.to_datetime(df1.Date)
df2.Date =pd.to_datetime(df2.Date)

first_condition = df1.Consumer_No.isin(df2.Consumer_No)
second_condition = df1.Date > min(df2.Date)
result = df1.loc[first_condition & second_condition]



RE: Compare between 2 DataFrames - Nidhesh - Jul-26-2019

Thank you very much scidam for your reply. Actually i am reading 2 csv files which results 2 different dataframes df1 & df2. Well, df1 consists monthly data while df2 consists the information about the check performed on particular user.So, I used second condition because i want to extract all the prior records of a matched consumers from the date mentioned in df2.

(Jul-25-2019, 11:51 PM)scidam Wrote: Something wrong with this code. df1 and df2 are not Pandas data frames,
they are Python dictionaries.
I don't understand a second condition, are you requiring that df1.Date date is greater (later)
than at least one date in df2.Date? These dataframes have non-equal sizes, so we cannot perform
elementwise comparison.

Nevertheless, try the following, this is probably is what you are looking for:

import pandas as pd
df1 = pd.DataFrame({'Consumer_No' :[1011,1012,1013,1014,1011,1012,1013,1014,1011,1012,1013,1014,1011,1012,1013,1014],
       'Date':['31/01/2019','31/01/2019','31/01/2019','31/01/2019','28/02/2019','28/02/2019','28/02/2019','28/02/2019','31/03/2019','31/03/2019','31/03/2019','31/03/2019','30/04/2019','30/04/2019','30/04/2019','30/04/2019']
      })
 
df2 = pd.DataFrame({'Consumer_No' : [1013,1011],'Date':['24/03/2019','29/04/2019']})

df1.Date =pd.to_datetime(df1.Date)
df2.Date =pd.to_datetime(df2.Date)

first_condition = df1.Consumer_No.isin(df2.Consumer_No)
second_condition = df1.Date > min(df2.Date)
result = df1.loc[first_condition & second_condition]