Nov-11-2021, 09:15 PM
(This post was last modified: Nov-11-2021, 10:07 PM by Yoriz.
Edit Reason: Added code tags
)
Hello, new here and new to python. I am trying to write a function which will display only non-matched and/or missing rows between two datasets. Some of the datasets will be in the thousands of rows. Closest I can get thus far is with two pandas dataframes using merge, which shows all rows but any advice to only get non-matched or a better approach?
Example:
Thank you.
Example:
import pandas as pd data1=[('id1','desc1'),('id2','desc2'),('id3','desc3')] data2=[('id1','wrong description'),('id2','desc2')] df1=pd.DataFrame(data1,columns=['ID','DESCRIPTION']) df2=pd.DataFrame(data2,columns=['ID','DESCRIPTION']) diff=pd.merge(df1,df2,how='outer',on=['ID'],suffixes=('_SRC','_TGT'),indicator=True) print(diff)
Output: ID DESCRIPTION_SRC DESCRIPTION_TGT
0 id1 desc1 wrong description
1 id2 desc2 desc2
2 id3 desc3 NaN
How would I get similar result with only row 0 and row 2? Thank you.