Python Forum

Full Version: How to mark duplicate rows in pandas
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,
I have below pandas dataframe:

ID	Pop	SG	time	            Stg	Rank   Name
A.1	T1	A.0	2020-08-01 10:45:00	VG	1	   LA
A.2	T1	A.0	2020-08-02 10:45:34	VG	3	   NT
K.6	T1	K.0	2020-08-03 10:45:20	BN	5	   PX
A.2	T1	A.0	2020-08-04 13:03:55	VG	8	   BN
K.3	T1	K.0	2020-08-05 14:45:13	BN	1	   LA
K.7	T1	K.0	2020-08-06 15:45:43	BN	0	   NN
K.3	T1	K.0	2020-08-07 15:45:34	BN	3	   CK
A.2	T1	H.0	2020-08-08 16:45:00	PP	8	   BN
I want to mark if ID, Pop, SG,Stg same except time then mark is DUP, otherwise NOR

Desired output:

ID	Pop	SG	time	            Stg	Rank   Name	Status
A.1	T1	A.0	2020-08-01 10:45:00	VG	1	   LA	NOR
A.2	T1	A.0	2020-08-02 10:45:34	VG	3	   NT	NOR
K.6	T1	K.0	2020-08-03 10:45:20	BN	5	   PX	NOR
A.2	T1	A.0	2020-08-04 13:03:55	VG	8	   BN	DUP
K.3	T1	K.0	2020-08-05 14:45:13	BN	1	   LA	NOR
K.7	T1	K.0	2020-08-06 15:45:43	BN	0	   NN	NOR
K.3	T1	K.0	2020-08-07 15:45:34	BN	3	   CK	DUP
A.2	T1	H.0	2020-08-08 16:45:00	PP	8	   BN	NOR
any method in dataframe? please help.
Whats about df.duplicated(['ID', 'Pop', 'SG','Stg'])?
I tried as below:

idx= df.duplicated(['ID', 'Pop', 'SG','Stg']).tolist()
indexes = [n for n,x in enumerate(idx) if x==True]
df['new_col']='NOR'
df['new_col'].iloc[indexes]='DUP'
but there is a warning as below:

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/sta...ersus-copy
self._setitem_with_indexer(indexer, value)
df.loc[df.duplicated(subset=['ID', 'Pop', 'SG','Stg'], keep=False), 'new_col'] = 'dup'
df.loc[~df.duplicated(subset=['ID', 'Pop', 'SG','Stg'], keep=False), 'new_col'] = 'Nor'