Python Forum
pandas data frame - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: pandas data frame (/thread-20735.html)



pandas data frame - dervast - Aug-28-2019

Hi all, I would like to drop all unique entries based on a specific column value.
I give an example below
data = [[10105, 1], [10105, 1], [10105, 0], [20205, 0], [20205, 0], [20205, 1], [20205, 1],[80215, 1]] 

test=pd.DataFrame(data,columns=["ID","label"])

test
Out[65]: 
      ID  label
0  10105      1
1  10105      1
2  10105      0
3  20205      0
4  20205      0
5  20205      1
6  20205      1
7  80215      1
I would like to keep all rows except the last one since the ID value happens only once. All the other rows are good.

Any ideas ?
Thanks
Alex


RE: pandas data frame - ThomasL - Aug-28-2019

you already know groupby and .count
test.groupby('ID').count().index
# Int64Index([10105, 20205, 80215], dtype='int64', name='ID')

test.groupby('ID').count().values.flatten()
array([3, 4, 1], dtype=int64)