[Pandas] Help with finding only non-matching rows

LowEnd

Hello, new here and new to python. I am trying to write a function which will display only non-matched and/or missing rows between two datasets. Some of the datasets will be in the thousands of rows. Closest I can get thus far is with two pandas dataframes using merge, which shows all rows but any advice to only get non-matched or a better approach?

Example:

import pandas as pd
data1=[('id1','desc1'),('id2','desc2'),('id3','desc3')]
data2=[('id1','wrong description'),('id2','desc2')]
df1=pd.DataFrame(data1,columns=['ID','DESCRIPTION'])
df2=pd.DataFrame(data2,columns=['ID','DESCRIPTION'])
diff=pd.merge(df1,df2,how='outer',on=['ID'],suffixes=('_SRC','_TGT'),indicator=True)
print(diff)

Output:    ID DESCRIPTION_SRC    DESCRIPTION_TGT
0  id1           desc1  wrong description
1  id2           desc2              desc2
2  id3           desc3                NaN

How would I get similar result with only row 0 and row 2?

Thank you.

Yoriz write Nov-11-2021, 09:29 PM:
Please post all code, output and errors (in their entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.

jefsummers · Nov-11-2021, 10:17 PM

import pandas as pd
data1=[('id1','desc1'),('id2','desc2'),('id3','desc3')]
data2=[('id1','wrong description'),('id2','desc2')]
df1=pd.DataFrame(data1,columns=['ID','DESCRIPTION'])
df2=pd.DataFrame(data2,columns=['ID','DESCRIPTION'])
print(pd.merge(df1,df2, indicator=True, how='outer')
         .query('_merge=="left_only"')
         .drop('_merge', axis=1))

Output:  ID DESCRIPTION
0  id1       desc1
2  id3       desc3

Does this work?

LowEnd

Thank you so much! It is closer, but I am wanting to capture the differences between source and target. I am wanting output something like:

Output:    ID DESCRIPTION_SRC    DESCRIPTION_TGT
0  id1           desc1  wrong description
2  id3           desc3                NaN

jefsummers · Nov-12-2021, 02:34 PM

import pandas as pd
data1=[('id1','desc1'),('id2','desc2'),('id3','desc3')]
data2=[('id1','wrong description'),('id2','desc2')]
df1=pd.DataFrame(data1,columns=['ID','DESCRIPTION'])
df2=pd.DataFrame(data2,columns=['ID','DESCRIPTION'])
df1a = df1.copy()
df1a['Desc2']=df2['DESCRIPTION']
df1a['Different'] = (df1a['DESCRIPTION'] != df1a['Desc2'])
df1a = df1a[df1a.Different == True]
df1a

Output:ID	DESCRIPTION	Desc2	Different
0	id1	desc1	wrong description	True
2	id3	desc3	NaN	True

You can then strip away the Different column.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Find duplicates in a pandas dataframe list column on other rows	Calab	2	2,212	Sep-18-2024, 07:38 PM Last Post: Calab
	Merging rows and adding columns based on matching index	pythonnewbie78	3	1,806	Dec-24-2023, 11:51 AM Last Post: Pedroski55
	Pandas Dataframe Filtering based on rows	mvdlm	0	2,089	Apr-02-2022, 06:39 PM Last Post: mvdlm
	pandas head() not reading all rows	naab	0	2,464	Apr-07-2020, 01:06 PM Last Post: naab
	How to add a few empty rows into a pandas dataframe	python_newbie09	2	18,515	Sep-20-2019, 08:52 AM Last Post: python_newbie09
	Subtract rows (like r[1]-r[2] and r[3]-r[3]) and no pandas	pradeepkumarbe	1	3,159	Dec-18-2018, 01:16 PM Last Post: ichabod801
	Write specific rows from pandas dataframe to csv file	pradeepkumarbe	3	7,144	Oct-18-2018, 09:33 PM Last Post: volcano63
	Output substrings from rows in pandas	brocq_18	5	4,995	Jun-21-2018, 11:30 AM Last Post: brocq_18
	Finding date count from a list of date range in pandas	trillerducas72	0	3,293	May-24-2018, 02:30 AM Last Post: trillerducas72
	finding exact and similar matches from pandas dataframe?	PrateekG	0	4,896	Apr-22-2018, 01:22 PM Last Post: PrateekG

[Pandas] Help with finding only non-matching rows

User Panel Messages

Announcements