(Jul-13-2023, 02:48 PM)deanhystad Wrote: 2. I do not understand your question. compute_percentages() already calculates percentage, percentage_start, and percentage_end. Why are you writing these again?
You should describe what you are trying to achieve, not how you plan to achieve it. As you said, you are a beginner, and maybe your approach to the problem is completely wrong. My early attempts using pandas resulted in bloated, messy code that were replaced with a few simple commands once I had a clue about how pandas works.
This is what I would like to achieve:
About 2. This code from my first post works perfectly and all your explanations allow me to start to understand it. I would like to ask if it would be possible to divide that function called calculate_percentages() into 3 separate functions, just for the sake of learning and experimenting purposes.
I have got source dataframe with columns: ['question', 'type', 'value', 'type_code'].
With this I can add a new column called "percentage":
def compute_percentage(df):
# Compute percentage of value with question group
perc = (df["value"] / df["value"].sum()) * 100
df["percentage"] = perc
return df
source = (
source
.groupby("question", group_keys=True)
.apply(compute_percentage)
.reset_index(drop=True)
)
And now I have source dataframe with the following columns:
['question', 'type', 'value', 'type_code', 'percentage']
Now I want to add a new column called percentage_end with the following code presented below (but this is not working properly - it gives wrong results):
def compute_percentage_end(df):
# Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
perc = df["percentage"]
df["percentage_end"] = perc.cumsum() - (perc.iloc[-2] + perc.iloc[-1] + perc.iloc[0] / 2)
return df
source = (
source
.groupby("question", group_keys=True)
.apply(compute_percentage_end)
.reset_index(drop=True)
)
I do not know why it doesn't work ?
Maybe all of 3 columns (percentage, percentage_end, percentage_start) have to be calculated all together, like in my original code ?
But if it is not a case, after figuring it out (adding percentage_end) I would like to add somehow the last column: percentage_start.
I have bought a book today "Pandas for everyone" that will come next week, so I will be studying it patiently.