Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 How to keep last entry from duplicate entries
#1
Hi,
I have data (data frame) as below. There Some duplicate entries with in same day, and want to keep last entry based on specified columns.
Name   Category  Mode     data_updated        status
A      B         Normal   st_NK_20200112_01   STRY
N      L         STD      st_NK_20200112_01   SJK_check
P      Y         Normal   st_NK_20200112_01   SPL_check
N      L         STD      st_NK_20200113_00   SJK_check
N      L         STD      st_NK_20200113_01   SJK_check
A      B         Normal   st_NK_20200113_01   STRY
A      B         Normal   st_NK_20200113_02   STRY
A      B         Normal   st_NK_20200113_03   STRY
I want to keep last entry based column1, column2, column5.

Desired output.
Name   Category  Mode     data_updated        status
P      Y         Normal   st_NK_20200112_01   SPL_check
N      L         STD      st_NK_20200113_01   SJK_check
A      B         Normal   st_NK_20200113_03   STRY
I could not find any method, kindly help.
Quote
#2
One way to keep last row from filtered rows is to use .tail(n=1):

>>> import pandas as pd
>>> df = pd.DataFrame([1, 2, 1, 2, 3, 3])                                                                              
>>> df                                                                                                                 
   0
0  1
1  2
2  1
3  2
4  3
5  3
>>> df.loc[df[0] == 2].tail(n=1)                                                                                       
   0
3  2
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Quote
#3
Try drop_duplicates with keep='last' parameter.
import pandas as pd

df = pd.read_clipboard()
result_df = df.drop_duplicates(subset=['Name', 'Category', 'status'], keep='last')
print(result_df)
Output:
Name Category Mode data_updated status 2 P Y Normal st_NK_20200112_01 SPL_check 4 N L STD st_NK_20200113_01 SJK_check 7 A B Normal st_NK_20200113_03 STRY
perfringo and buran like this post
Quote

Top Page

Forum Jump:


Users browsing this thread: 1 Guest(s)