Python Forum

Full Version: [SOLVED on SO] Downsizing non-representative data in DataFrame
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi, folks,
I occasionally dabble in pandas - but I cannot claim deep knowledge. Today I had to filter out some rows from a DataFrame based on occurence of a value in a certain column. As in this example
Output:
In [57]: table = pd.DataFrame([[2, 'a'], [3, 'b'], [2, 'c'], [4, 'd'], [4, 'e'], [5, 'f']], ...: columns=('group', 'letter')) ...: print(table) ...: group letter 0 2 a 1 3 b 2 2 c 3 4 d 4 4 e 5 5 f
I want to remove all rows where a value in the group column appears only once.

I hacked around the problem by this inellegant solution
Output:
In [58]: pd.concat(df for _, df in table.groupby(by=['group']) if len(df) > 1) Out[58]: group letter 0 2 a 2 2 c 3 4 d 4 4 e
But I bet there are proper ways to achieve the same goal.

Anyone can suggest a more pandaic Tongue solution?!

Thanks in advance
If anyone is interested - with trepidation, I posted this question on SO (those in the know will understand my reluctance), and - got an answer, without being hassled for the whole half an hour and counting Dance