![]() |
How to mark duplicate rows in pandas - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: How to mark duplicate rows in pandas (/thread-29642.html) |
How to mark duplicate rows in pandas - Mekala - Sep-14-2020 Hi, I have below pandas dataframe: ID Pop SG time Stg Rank Name A.1 T1 A.0 2020-08-01 10:45:00 VG 1 LA A.2 T1 A.0 2020-08-02 10:45:34 VG 3 NT K.6 T1 K.0 2020-08-03 10:45:20 BN 5 PX A.2 T1 A.0 2020-08-04 13:03:55 VG 8 BN K.3 T1 K.0 2020-08-05 14:45:13 BN 1 LA K.7 T1 K.0 2020-08-06 15:45:43 BN 0 NN K.3 T1 K.0 2020-08-07 15:45:34 BN 3 CK A.2 T1 H.0 2020-08-08 16:45:00 PP 8 BNI want to mark if ID, Pop, SG,Stg same except time then mark is DUP, otherwise NOR Desired output: ID Pop SG time Stg Rank Name Status A.1 T1 A.0 2020-08-01 10:45:00 VG 1 LA NOR A.2 T1 A.0 2020-08-02 10:45:34 VG 3 NT NOR K.6 T1 K.0 2020-08-03 10:45:20 BN 5 PX NOR A.2 T1 A.0 2020-08-04 13:03:55 VG 8 BN DUP K.3 T1 K.0 2020-08-05 14:45:13 BN 1 LA NOR K.7 T1 K.0 2020-08-06 15:45:43 BN 0 NN NOR K.3 T1 K.0 2020-08-07 15:45:34 BN 3 CK DUP A.2 T1 H.0 2020-08-08 16:45:00 PP 8 BN NORany method in dataframe? please help. RE: How to mark duplicate rows in pandas - scidam - Sep-15-2020 Whats about df.duplicated(['ID', 'Pop', 'SG','Stg']) ?
RE: How to mark duplicate rows in pandas - Mekala - Sep-17-2020 I tried as below: idx= df.duplicated(['ID', 'Pop', 'SG','Stg']).tolist() indexes = [n for n,x in enumerate(idx) if x==True] df['new_col']='NOR' df['new_col'].iloc[indexes]='DUP'but there is a warning as below: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self._setitem_with_indexer(indexer, value) RE: How to mark duplicate rows in pandas - scidam - Sep-17-2020 df.loc[df.duplicated(subset=['ID', 'Pop', 'SG','Stg'], keep=False), 'new_col'] = 'dup' df.loc[~df.duplicated(subset=['ID', 'Pop', 'SG','Stg'], keep=False), 'new_col'] = 'Nor' |