Python Forum

Full Version: How to keep last entry from duplicate entries
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,
I have data (data frame) as below. There Some duplicate entries with in same day, and want to keep last entry based on specified columns.
Name   Category  Mode     data_updated        status
A      B         Normal   st_NK_20200112_01   STRY
N      L         STD      st_NK_20200112_01   SJK_check
P      Y         Normal   st_NK_20200112_01   SPL_check
N      L         STD      st_NK_20200113_00   SJK_check
N      L         STD      st_NK_20200113_01   SJK_check
A      B         Normal   st_NK_20200113_01   STRY
A      B         Normal   st_NK_20200113_02   STRY
A      B         Normal   st_NK_20200113_03   STRY
I want to keep last entry based column1, column2, column5.

Desired output.
Name   Category  Mode     data_updated        status
P      Y         Normal   st_NK_20200112_01   SPL_check
N      L         STD      st_NK_20200113_01   SJK_check
A      B         Normal   st_NK_20200113_03   STRY
I could not find any method, kindly help.
One way to keep last row from filtered rows is to use .tail(n=1):

>>> import pandas as pd
>>> df = pd.DataFrame([1, 2, 1, 2, 3, 3])                                                                              
>>> df                                                                                                                 
   0
0  1
1  2
2  1
3  2
4  3
5  3
>>> df.loc[df[0] == 2].tail(n=1)                                                                                       
   0
3  2
Try drop_duplicates with keep='last' parameter.
import pandas as pd

df = pd.read_clipboard()
result_df = df.drop_duplicates(subset=['Name', 'Category', 'status'], keep='last')
print(result_df)
Output:
Name Category Mode data_updated status 2 P Y Normal st_NK_20200112_01 SPL_check 4 N L STD st_NK_20200113_01 SJK_check 7 A B Normal st_NK_20200113_03 STRY