Python Forum
[SOLVED on SO] Downsizing non-representative data in DataFrame
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[SOLVED on SO] Downsizing non-representative data in DataFrame
#1
Hi, folks,
I occasionally dabble in pandas - but I cannot claim deep knowledge. Today I had to filter out some rows from a DataFrame based on occurence of a value in a certain column. As in this example
Output:
In [57]: table = pd.DataFrame([[2, 'a'], [3, 'b'], [2, 'c'], [4, 'd'], [4, 'e'], [5, 'f']], ...: columns=('group', 'letter')) ...: print(table) ...: group letter 0 2 a 1 3 b 2 2 c 3 4 d 4 4 e 5 5 f
I want to remove all rows where a value in the group column appears only once.

I hacked around the problem by this inellegant solution
Output:
In [58]: pd.concat(df for _, df in table.groupby(by=['group']) if len(df) > 1) Out[58]: group letter 0 2 a 2 2 c 3 4 d 4 4 e
But I bet there are proper ways to achieve the same goal.

Anyone can suggest a more pandaic Tongue solution?!

Thanks in advance
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply
#2
If anyone is interested - with trepidation, I posted this question on SO (those in the know will understand my reluctance), and - got an answer, without being hassled for the whole half an hour and counting Dance
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Create dataframe from the unique data of two dataframes Calab 6 1,289 Mar-02-2025, 01:51 PM
Last Post: Pedroski55
Question [Solved] Formatting cells of a pandas dataframe into an OpenDocument ods spreadsheet Calab 1 891 Mar-01-2025, 04:51 AM
Last Post: Calab
  [solved] how to speed-up huge data in an ascii file ? paul18fr 4 2,561 May-16-2023, 08:36 PM
Last Post: paul18fr
  How to insert data in a dataframe? man0s 1 2,005 Apr-26-2022, 11:36 PM
Last Post: jefsummers
Question [Solved] How to refer to dataframe column name based on a list lorensa74 1 3,093 May-17-2021, 07:02 AM
Last Post: lorensa74
  Filter data based on a value from another dataframe column and create a file using lo pawanmtm 1 5,275 Jul-15-2020, 06:20 PM
Last Post: pawanmtm
  datetime intervals - dataframe selection (via plotted data) karlito 0 2,150 Nov-12-2019, 08:16 AM
Last Post: karlito
  How to add data to the categorical index of dataframe as data arrives? AlekseyPython 1 3,096 Oct-16-2019, 06:26 AM
Last Post: AlekseyPython
  Inserting data from python list into a pandas dataframe mahmoud899 0 3,111 Mar-02-2019, 04:07 AM
Last Post: mahmoud899
  Pandas nested json data to dataframe FrankC 1 11,160 Aug-14-2018, 01:37 AM
Last Post: scidam

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020