![]() |
[Pandas] Help with finding only non-matching rows - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: [Pandas] Help with finding only non-matching rows (/thread-35513.html) |
[Pandas] Help with finding only non-matching rows - LowEnd - Nov-11-2021 Hello, new here and new to python. I am trying to write a function which will display only non-matched and/or missing rows between two datasets. Some of the datasets will be in the thousands of rows. Closest I can get thus far is with two pandas dataframes using merge, which shows all rows but any advice to only get non-matched or a better approach? Example: import pandas as pd data1=[('id1','desc1'),('id2','desc2'),('id3','desc3')] data2=[('id1','wrong description'),('id2','desc2')] df1=pd.DataFrame(data1,columns=['ID','DESCRIPTION']) df2=pd.DataFrame(data2,columns=['ID','DESCRIPTION']) diff=pd.merge(df1,df2,how='outer',on=['ID'],suffixes=('_SRC','_TGT'),indicator=True) print(diff) How would I get similar result with only row 0 and row 2? Thank you. RE: [Pandas] Help with finding only non-matching rows - jefsummers - Nov-11-2021 import pandas as pd data1=[('id1','desc1'),('id2','desc2'),('id3','desc3')] data2=[('id1','wrong description'),('id2','desc2')] df1=pd.DataFrame(data1,columns=['ID','DESCRIPTION']) df2=pd.DataFrame(data2,columns=['ID','DESCRIPTION']) print(pd.merge(df1,df2, indicator=True, how='outer') .query('_merge=="left_only"') .drop('_merge', axis=1)) Does this work?
RE: [Pandas] Help with finding only non-matching rows - LowEnd - Nov-11-2021 Thank you so much! It is closer, but I am wanting to capture the differences between source and target. I am wanting output something like:
RE: [Pandas] Help with finding only non-matching rows - jefsummers - Nov-12-2021 import pandas as pd data1=[('id1','desc1'),('id2','desc2'),('id3','desc3')] data2=[('id1','wrong description'),('id2','desc2')] df1=pd.DataFrame(data1,columns=['ID','DESCRIPTION']) df2=pd.DataFrame(data2,columns=['ID','DESCRIPTION']) df1a = df1.copy() df1a['Desc2']=df2['DESCRIPTION'] df1a['Different'] = (df1a['DESCRIPTION'] != df1a['Desc2']) df1a = df1a[df1a.Different == True] df1a You can then strip away the Different column.
|