Python Forum
jupyter pandas remove duplicates help
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
jupyter pandas remove duplicates help
#1
Hi there

Why after drop the duplicates, the no of row data still same?


data.shape
output:(15631, 12)

data1 = data
data1.sort_values(by=['MACHINERYSTATUS','DATECREATED'])
data1.drop_duplicates(['CNTRNO'], keep='last')
data1.shape
output: (15631, 12)
Reply
#2
Try to test which duplicates python finds:

data1.duplicated()
or

print(data1.duplicated())
I have played around with this function now myself it's easy to confuse rows and columns.
Reply
#3
Hi

after dropping the duplicates, it is still there

data.sort_values(by=['CNTRNO','DATECREATED'])
data.drop_duplicates(['CNTRNO'], keep='last')
data.duplicated('CNTRNO')
Output:
0 False 1 False 2 True 3 True 4 True 5 False 6 True 7 True 8 True 9 True 10 True 11 True 12 True
Reply
#4
Hi again

Try and change your second line:
data.drop_duplicates(['CNTRNO'], keep='last')
To:
data3 = data.drop_duplicates(['CNTRNO'], keep='last')
And see how the new dataframe - a modified copy of 'data' behaves.
The 'data' df might be immutable or something similar - I'm not good with the programming lingo.

Another time you could consider showing a subset of your df graphically.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Jupyter notebooks, jupyter kernels, virtual environment created in Anaconda... bytecrunch 8 2,269 Nov-05-2023, 06:38 PM
Last Post: snippsat
  Add group number for duplicates atomxkai 2 1,135 Dec-08-2022, 06:08 AM
Last Post: atomxkai
  Counting Duplicates in large Data Set jmair 3 1,128 Dec-07-2022, 09:42 AM
Last Post: paul18fr
Thumbs Up can't access data from URL in pandas/jupyter notebook aaanoushka 1 1,863 Feb-13-2022, 01:19 PM
Last Post: jefsummers
  Remove extra count columns created by pandas groupby spyf8 1 2,722 Feb-10-2021, 09:19 AM
Last Post: Naheed
  Python Custom Module not working in Jupyter Notebook with Pandas fid 0 2,031 Jul-04-2020, 11:05 AM
Last Post: fid
  Pandas Indexing with duplicates energerecontractuel 3 2,857 Mar-07-2019, 12:57 AM
Last Post: scidam
  Python pandas remove default format in excel indra 0 2,701 Feb-06-2019, 04:48 AM
Last Post: indra

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020