Hi there
Why after drop the duplicates, the no of row data still same?
data.shape
output:(15631, 12)
data1 = data
data1.sort_values(by=['MACHINERYSTATUS','DATECREATED'])
data1.drop_duplicates(['CNTRNO'], keep='last')
data1.shape
output: (15631, 12)
Try to test which duplicates python finds:
data1.duplicated()
or
print(data1.duplicated())
I have played around with this function now myself it's easy to confuse rows and columns.
Hi
after dropping the duplicates, it is still there
data.sort_values(by=['CNTRNO','DATECREATED'])
data.drop_duplicates(['CNTRNO'], keep='last')
data.duplicated('CNTRNO')
Output:
0 False
1 False
2 True
3 True
4 True
5 False
6 True
7 True
8 True
9 True
10 True
11 True
12 True
Hi again
Try and change your second line:
data.drop_duplicates(['CNTRNO'], keep='last')
To:
data3 = data.drop_duplicates(['CNTRNO'], keep='last')
And see how the new dataframe - a modified copy of 'data' behaves.
The 'data' df might be immutable or something similar - I'm not good with the programming lingo.
Another time you could consider showing a subset of your df graphically.