Python Forum
[Pandas] Help with finding only non-matching rows
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Pandas] Help with finding only non-matching rows
#1
Hello, new here and new to python. I am trying to write a function which will display only non-matched and/or missing rows between two datasets. Some of the datasets will be in the thousands of rows. Closest I can get thus far is with two pandas dataframes using merge, which shows all rows but any advice to only get non-matched or a better approach?

Example:

import pandas as pd
data1=[('id1','desc1'),('id2','desc2'),('id3','desc3')]
data2=[('id1','wrong description'),('id2','desc2')]
df1=pd.DataFrame(data1,columns=['ID','DESCRIPTION'])
df2=pd.DataFrame(data2,columns=['ID','DESCRIPTION'])
diff=pd.merge(df1,df2,how='outer',on=['ID'],suffixes=('_SRC','_TGT'),indicator=True)
print(diff)
Output:
ID DESCRIPTION_SRC DESCRIPTION_TGT 0 id1 desc1 wrong description 1 id2 desc2 desc2 2 id3 desc3 NaN
How would I get similar result with only row 0 and row 2?

Thank you.
Yoriz write Nov-11-2021, 09:29 PM:
Please post all code, output and errors (in their entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply
#2
import pandas as pd
data1=[('id1','desc1'),('id2','desc2'),('id3','desc3')]
data2=[('id1','wrong description'),('id2','desc2')]
df1=pd.DataFrame(data1,columns=['ID','DESCRIPTION'])
df2=pd.DataFrame(data2,columns=['ID','DESCRIPTION'])
print(pd.merge(df1,df2, indicator=True, how='outer')
         .query('_merge=="left_only"')
         .drop('_merge', axis=1))
Output:
ID DESCRIPTION 0 id1 desc1 2 id3 desc3
Does this work?
Reply
#3
Thank you so much! It is closer, but I am wanting to capture the differences between source and target. I am wanting output something like:

Output:
ID DESCRIPTION_SRC DESCRIPTION_TGT 0 id1 desc1 wrong description 2 id3 desc3 NaN
Reply
#4
import pandas as pd
data1=[('id1','desc1'),('id2','desc2'),('id3','desc3')]
data2=[('id1','wrong description'),('id2','desc2')]
df1=pd.DataFrame(data1,columns=['ID','DESCRIPTION'])
df2=pd.DataFrame(data2,columns=['ID','DESCRIPTION'])
df1a = df1.copy()
df1a['Desc2']=df2['DESCRIPTION']
df1a['Different'] = (df1a['DESCRIPTION'] != df1a['Desc2'])
df1a = df1a[df1a.Different == True]
df1a
Output:
ID DESCRIPTION Desc2 Different 0 id1 desc1 wrong description True 2 id3 desc3 NaN True
You can then strip away the Different column.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Merging rows and adding columns based on matching index pythonnewbie78 3 817 Dec-24-2023, 11:51 AM
Last Post: Pedroski55
  Pandas Dataframe Filtering based on rows mvdlm 0 1,439 Apr-02-2022, 06:39 PM
Last Post: mvdlm
  pandas head() not reading all rows naab 0 1,809 Apr-07-2020, 01:06 PM
Last Post: naab
  How to add a few empty rows into a pandas dataframe python_newbie09 2 16,335 Sep-20-2019, 08:52 AM
Last Post: python_newbie09
  Subtract rows (like r[1]-r[2] and r[3]-r[3]) and no pandas pradeepkumarbe 1 2,602 Dec-18-2018, 01:16 PM
Last Post: ichabod801
  Write specific rows from pandas dataframe to csv file pradeepkumarbe 3 5,504 Oct-18-2018, 09:33 PM
Last Post: volcano63
  Output substrings from rows in pandas brocq_18 5 3,965 Jun-21-2018, 11:30 AM
Last Post: brocq_18
  Finding date count from a list of date range in pandas trillerducas72 0 2,747 May-24-2018, 02:30 AM
Last Post: trillerducas72
  finding exact and similar matches from pandas dataframe? PrateekG 0 4,221 Apr-22-2018, 01:22 PM
Last Post: PrateekG
  pandas restricting csv read to certain rows metalray 5 20,815 Dec-16-2017, 07:39 AM
Last Post: metalray

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020