Python Forum

Hi there

Why after drop the duplicates, the no of row data still same?

data.shape

output:(15631, 12)

data1 = data
data1.sort_values(by=['MACHINERYSTATUS','DATECREATED'])
data1.drop_duplicates(['CNTRNO'], keep='last')
data1.shape

output: (15631, 12)

Try to test which duplicates python finds:

data1.duplicated()

print(data1.duplicated())

I have played around with this function now myself it's easy to confuse rows and columns.

Hi

after dropping the duplicates, it is still there

data.sort_values(by=['CNTRNO','DATECREATED'])
data.drop_duplicates(['CNTRNO'], keep='last')
data.duplicated('CNTRNO')

Output:0        False
1        False
2         True
3         True
4         True
5        False
6         True
7         True
8         True
9         True
10        True
11        True
12        True

Hi again

Try and change your second line:

data.drop_duplicates(['CNTRNO'], keep='last')

To:

data3 = data.drop_duplicates(['CNTRNO'], keep='last')

And see how the new dataframe - a modified copy of 'data' behaves.
The 'data' df might be immutable or something similar - I'm not good with the programming lingo.

Another time you could consider showing a subset of your df graphically.

okl

glidecode

okl

glidecode