Python Forum

Hello, new here and new to python. I am trying to write a function which will display only non-matched and/or missing rows between two datasets. Some of the datasets will be in the thousands of rows. Closest I can get thus far is with two pandas dataframes using merge, which shows all rows but any advice to only get non-matched or a better approach?

Example:

import pandas as pd
data1=[('id1','desc1'),('id2','desc2'),('id3','desc3')]
data2=[('id1','wrong description'),('id2','desc2')]
df1=pd.DataFrame(data1,columns=['ID','DESCRIPTION'])
df2=pd.DataFrame(data2,columns=['ID','DESCRIPTION'])
diff=pd.merge(df1,df2,how='outer',on=['ID'],suffixes=('_SRC','_TGT'),indicator=True)
print(diff)

Output:    ID DESCRIPTION_SRC    DESCRIPTION_TGT
0  id1           desc1  wrong description
1  id2           desc2              desc2
2  id3           desc3                NaN

How would I get similar result with only row 0 and row 2?

Thank you.

import pandas as pd
data1=[('id1','desc1'),('id2','desc2'),('id3','desc3')]
data2=[('id1','wrong description'),('id2','desc2')]
df1=pd.DataFrame(data1,columns=['ID','DESCRIPTION'])
df2=pd.DataFrame(data2,columns=['ID','DESCRIPTION'])
print(pd.merge(df1,df2, indicator=True, how='outer')
         .query('_merge=="left_only"')
         .drop('_merge', axis=1))

Output:  ID DESCRIPTION
0  id1       desc1
2  id3       desc3

Does this work?

Thank you so much! It is closer, but I am wanting to capture the differences between source and target. I am wanting output something like:

Output:    ID DESCRIPTION_SRC    DESCRIPTION_TGT
0  id1           desc1  wrong description
2  id3           desc3                NaN

import pandas as pd
data1=[('id1','desc1'),('id2','desc2'),('id3','desc3')]
data2=[('id1','wrong description'),('id2','desc2')]
df1=pd.DataFrame(data1,columns=['ID','DESCRIPTION'])
df2=pd.DataFrame(data2,columns=['ID','DESCRIPTION'])
df1a = df1.copy()
df1a['Desc2']=df2['DESCRIPTION']
df1a['Different'] = (df1a['DESCRIPTION'] != df1a['Desc2'])
df1a = df1a[df1a.Different == True]
df1a

Output:ID	DESCRIPTION	Desc2	Different
0	id1	desc1	wrong description	True
2	id3	desc3	NaN	True

You can then strip away the Different column.

LowEnd

jefsummers

LowEnd

jefsummers