Python Forum

Full Version: New Dataframe Column Based on Several Conditions
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am trying to add a new column to my dataframe based on several complex conditions. I have attached a sample of what I have, along with the column I am trying to add (highlighted). I have a table with customer IDs and revenues for 2021, 2020, 2019, 2018, and 2017. I want to add a "customer status" column that is based off of the revenue columns, and will be populated with the following selections based on the following conditions:

- 2021 New – Positive revenue in 2021 but no (or negative) billings in 2020, 2019, 2018, and 2017.
- 2021 Lost – No (or negative) revenue in 2021, but positive revenue in 2020.
- 2021 Renewed – Positive revenue in 2021, no (or negative) revenue in 2020, and positive revenue in either or all of 2019, 2018, or 2017.
- N/A – Negative or zero revenue in 2021 and negative or zero revenue in 2020
- 2021 Existing – Positive revenue in 2021 and positive revenue in 2020

I have tried using np.select, for loops, and other methods, but the number of conditions is giving me trouble.
Sounds like a case for pandas.apply()
Basically, you create a function that will determine the value to be assigned. This allows you to have lots of complicated expressions involved, and then the function returns the value to be placed in the new column.
See https://pandas.pydata.org/docs/reference...apply.html