[SOLVED on SO] Downsizing non-representative data in DataFrame

[SOLVED on SO] Downsizing non-representative data in DataFrame - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: [SOLVED on SO] Downsizing non-representative data in DataFrame (/thread-13103.html)

[SOLVED on SO] Downsizing non-representative data in DataFrame - volcano63 - Sep-27-2018

Hi, folks,
I occasionally dabble in pandas - but I cannot claim deep knowledge. Today I had to filter out some rows from a DataFrame based on occurence of a value in a certain column. As in this example

Output:In [57]: table = pd.DataFrame([[2, 'a'], [3, 'b'], [2, 'c'], [4, 'd'], [4, 'e'], [5, 'f']], 
    ...:                      columns=('group', 'letter'))
    ...: print(table)
    ...:                      
   group letter
0      2      a
1      3      b
2      2      c
3      4      d
4      4      e
5      5      f

I want to remove all rows where a value in the group column appears only once.

I hacked around the problem by this inellegant solution

Output:In [58]: pd.concat(df for _, df in table.groupby(by=['group']) if len(df) > 1)
Out[58]: 
   group letter
0      2      a
2      2      c
3      4      d
4      4      e

But I bet there are proper ways to achieve the same goal.

Anyone can suggest a more pandaic Tongue

solution?!

Thanks in advance

RE: Downsizing non-representative data in DataFrame - volcano63 - Sep-28-2018

If anyone is interested - with trepidation, I posted this question on SO (those in the know will understand my reluctance), and - got an answer, without being hassled for the whole half an hour and counting Dance