Python Forum
How to mark duplicate rows in pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to mark duplicate rows in pandas
#1
Hi,
I have below pandas dataframe:

ID	Pop	SG	time	            Stg	Rank   Name
A.1	T1	A.0	2020-08-01 10:45:00	VG	1	   LA
A.2	T1	A.0	2020-08-02 10:45:34	VG	3	   NT
K.6	T1	K.0	2020-08-03 10:45:20	BN	5	   PX
A.2	T1	A.0	2020-08-04 13:03:55	VG	8	   BN
K.3	T1	K.0	2020-08-05 14:45:13	BN	1	   LA
K.7	T1	K.0	2020-08-06 15:45:43	BN	0	   NN
K.3	T1	K.0	2020-08-07 15:45:34	BN	3	   CK
A.2	T1	H.0	2020-08-08 16:45:00	PP	8	   BN
I want to mark if ID, Pop, SG,Stg same except time then mark is DUP, otherwise NOR

Desired output:

ID	Pop	SG	time	            Stg	Rank   Name	Status
A.1	T1	A.0	2020-08-01 10:45:00	VG	1	   LA	NOR
A.2	T1	A.0	2020-08-02 10:45:34	VG	3	   NT	NOR
K.6	T1	K.0	2020-08-03 10:45:20	BN	5	   PX	NOR
A.2	T1	A.0	2020-08-04 13:03:55	VG	8	   BN	DUP
K.3	T1	K.0	2020-08-05 14:45:13	BN	1	   LA	NOR
K.7	T1	K.0	2020-08-06 15:45:43	BN	0	   NN	NOR
K.3	T1	K.0	2020-08-07 15:45:34	BN	3	   CK	DUP
A.2	T1	H.0	2020-08-08 16:45:00	PP	8	   BN	NOR
any method in dataframe? please help.
Reply
#2
Whats about df.duplicated(['ID', 'Pop', 'SG','Stg'])?
Reply
#3
I tried as below:

idx= df.duplicated(['ID', 'Pop', 'SG','Stg']).tolist()
indexes = [n for n,x in enumerate(idx) if x==True]
df['new_col']='NOR'
df['new_col'].iloc[indexes]='DUP'
but there is a warning as below:

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/sta...ersus-copy
self._setitem_with_indexer(indexer, value)
Reply
#4
df.loc[df.duplicated(subset=['ID', 'Pop', 'SG','Stg'], keep=False), 'new_col'] = 'dup'
df.loc[~df.duplicated(subset=['ID', 'Pop', 'SG','Stg'], keep=False), 'new_col'] = 'Nor'
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to assign a value to pandas dataframe column rows based on a condition klllmmm 0 254 Sep-08-2022, 06:32 AM
Last Post: klllmmm
  Mark outlook emails as read using Python! shane88 2 3,571 Feb-24-2022, 11:19 PM
Last Post: Pedroski55
  Doctesting a function which prints a students name along with the maximum mark scored sean1 5 1,149 Feb-01-2022, 12:20 PM
Last Post: Pedroski55
  The code I have written removes the desired number of rows, but wrong rows Jdesi1983 0 1,006 Dec-08-2021, 04:42 AM
Last Post: Jdesi1983
  How to combine multiple rows of strings into one using pandas? shantanu97 1 2,174 Aug-22-2021, 05:26 AM
Last Post: klllmmm
  Python Pandas: How do I sumproduct by rows with an if condition? JaneTan 2 2,972 Jul-13-2021, 11:36 AM
Last Post: jefsummers
  Partial Matching Rows In Pandas DataFrame Query eddywinch82 1 1,546 Jul-08-2021, 06:32 PM
Last Post: eddywinch82
  Pandas DataFrame combine rows by column value, where Date Rows are NULL rhat398 0 1,479 May-04-2021, 10:51 PM
Last Post: rhat398
  Indexing [::-1] to Reverse ALL 2D Array Rows, ALL 3D, 4D Array Columns & Rows Python Jeremy7 8 4,712 Mar-02-2021, 01:54 AM
Last Post: Jeremy7
  Pandas: how to split one row of data to multiple rows and columns in Python GerardMoussendo 4 4,616 Feb-22-2021, 06:51 PM
Last Post: eddywinch82

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020