Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Removing rows at random based on the value of a specific column
#1
As the title entails, I am trying to remove rows from a pandas dataframe at random based on whether the given row (instance) has a certain value in some given column.

For example, suppose I had the following dataframe:

see attachment if image doesn't show*
İmage


and I wanted to remove at random 33% of the rows which have a value of 0 in the 'balon_dor_winner' column, how would I go about doing it?

I have tried the following but it hasn't worked:

df.drop(df.loc[df['balon_dor_winner']==0].sample(frac=0.33).index)
which didn't work and also:

df.drop(df.query('balon_dor_winner == 0').sample(frac=.33).index)
but no luck so far.


Attached Files Thumbnail(s)
   
Quote
#2
What about adding another column with random numbers and dropping if winner is 0 and new column < 0.33?
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures

Quote
#3
Yeah that sounds okay, but I would have really liked to do it as described, I'm sure it could be done in R it's just I want start using pandas more.
Quote
#4
Well, instead of adding a column you could make a boolean list with both conditions and subset with that.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures

Quote
#5
Got it, my mistake was not assigning it to a variable as so:
df=df.drop(df.query('salary == 0').sample(frac=.41).index)
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Pandas - Dynamic column aggregation based on another column theroadbacktonature 0 156 Apr-17-2020, 04:54 PM
Last Post: theroadbacktonature
  How can I convert time-series data in rows into column srvmig 0 142 Apr-11-2020, 05:40 AM
Last Post: srvmig
Question Dividing a single column of dataframe into multiple columns based on char length darpInd 2 244 Mar-14-2020, 09:19 AM
Last Post: scidam
  Sum product multiple Dataframes based on column headers. Lastwizzle 0 1,212 May-21-2019, 04:05 PM
Last Post: Lastwizzle
  How to delete column if entire column values are "nan" Sri 4 1,014 Apr-13-2019, 12:16 PM
Last Post: Sri
  How to create a random library for an specific function andre_fermart 4 787 Apr-10-2019, 11:02 PM
Last Post: andre_fermart
  Create selection box to pass string value based on uniques in Excel column sneakysnek 1 755 Nov-18-2018, 07:29 PM
Last Post: Stefanovietch
  Write specific rows from pandas dataframe to csv file pradeepkumarbe 3 1,354 Oct-18-2018, 09:33 PM
Last Post: volcano63
  Splitting values in column in a pandas dataframe based on a condition hey_arnold 1 2,069 Jul-24-2018, 02:18 PM
Last Post: hey_arnold
  Load specific set of Rows CSV Marcuslang 6 1,377 Jul-01-2018, 05:35 PM
Last Post: Marcuslang

Forum Jump:


Users browsing this thread: 1 Guest(s)