Nov-11-2021, 09:15 PM
(This post was last modified: Nov-11-2021, 10:07 PM by Yoriz.
Edit Reason: Added code tags
)
Hello, new here and new to python. I am trying to write a function which will display only non-matched and/or missing rows between two datasets. Some of the datasets will be in the thousands of rows. Closest I can get thus far is with two pandas dataframes using merge, which shows all rows but any advice to only get non-matched or a better approach?
Example:
Thank you.
Example:
1 2 3 4 5 6 7 |
import pandas as pd data1 = [( 'id1' , 'desc1' ),( 'id2' , 'desc2' ),( 'id3' , 'desc3' )] data2 = [( 'id1' , 'wrong description' ),( 'id2' , 'desc2' )] df1 = pd.DataFrame(data1,columns = [ 'ID' , 'DESCRIPTION' ]) df2 = pd.DataFrame(data2,columns = [ 'ID' , 'DESCRIPTION' ]) diff = pd.merge(df1,df2,how = 'outer' ,on = [ 'ID' ],suffixes = ( '_SRC' , '_TGT' ),indicator = True ) print (diff) |
Output: ID DESCRIPTION_SRC DESCRIPTION_TGT
0 id1 desc1 wrong description
1 id2 desc2 desc2
2 id3 desc3 NaN
How would I get similar result with only row 0 and row 2? Thank you.