Python Forum
Removing rows at random based on the value of a specific column - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Removing rows at random based on the value of a specific column (/thread-12412.html)



Removing rows at random based on the value of a specific column - Mr_Keystrokes - Aug-23-2018

As the title entails, I am trying to remove rows from a pandas dataframe at random based on whether the given row (instance) has a certain value in some given column.

For example, suppose I had the following dataframe:

see attachment if image doesn't show*
İmage


and I wanted to remove at random 33% of the rows which have a value of 0 in the 'balon_dor_winner' column, how would I go about doing it?

I have tried the following but it hasn't worked:

df.drop(df.loc[df['balon_dor_winner']==0].sample(frac=0.33).index)
which didn't work and also:

df.drop(df.query('balon_dor_winner == 0').sample(frac=.33).index)
but no luck so far.


RE: Removing rows at random based on the value of a specific column - ichabod801 - Aug-23-2018

What about adding another column with random numbers and dropping if winner is 0 and new column < 0.33?


RE: Removing rows at random based on the value of a specific column - Mr_Keystrokes - Aug-23-2018

Yeah that sounds okay, but I would have really liked to do it as described, I'm sure it could be done in R it's just I want start using pandas more.


RE: Removing rows at random based on the value of a specific column - ichabod801 - Aug-23-2018

Well, instead of adding a column you could make a boolean list with both conditions and subset with that.


RE: Removing rows at random based on the value of a specific column - Mr_Keystrokes - Aug-24-2018

Got it, my mistake was not assigning it to a variable as so:
df=df.drop(df.query('salary == 0').sample(frac=.41).index)