Jul-12-2023, 07:17 PM
(This post was last modified: Jul-12-2023, 07:18 PM by deanhystad.)
Maybe this will make the indexing clearer.
I made a small change to your code. Instead of computing a numeric type_code I left type_code as a string ("Strongly disagree"...). Now when compute_percentages() reindexes the dataframe to use the "type_code", the row indices are words, not ints. This required a change to the "percentage_end" calculation because there is no perc[-2], perc[-1], or perc[0]. These are now perc["Strongly disagree"], perc["Disagree"] and perc["Neither agree nor disagree"].
One more attempt to make this clear. Here I sort the source dataframe by the "value" column.
I made a small change to your code. Instead of computing a numeric type_code I left type_code as a string ("Strongly disagree"...). Now when compute_percentages() reindexes the dataframe to use the "type_code", the row indices are words, not ints. This required a change to the "percentage_end" calculation because there is no perc[-2], perc[-1], or perc[0]. These are now perc["Strongly disagree"], perc["Disagree"] and perc["Neither agree nor disagree"].
source["type_code"] = source["type"] # Changed so type_code is words, not -2, -1, 0, 1, 2 def compute_percentages(df): # Set type_code as index and sort df = df.set_index("type") # Compute percentage of value with question group perc = (df["value"] / df["value"].sum()) * 100 df["percentage"] = perc # Compute percentage end, centered on "Neither agree nor disagree" (type_code 0) # Notice that we have to use words to index perc, because the row indices are words, not numbers. df["percentage_end"] = perc.cumsum() - ( perc["Strongly disagree"] + perc["Disagree"] + perc["Neither agree nor disagree"] / 2 ) # Compute percentage start by subtracting percent df["percentage_start"] = df["percentage_end"] - perc return dfWhen computing perc outside the dataframe, you still created a series. The row indices just happened to be numbers from 0 to 39, but they were still keys, not positions in some array or list.
One more attempt to make this clear. Here I sort the source dataframe by the "value" column.
source = source.sort_values("value") print(source)
Output: question value type_code percentage percentage_end percentage_start
30 Question 7 0 Strongly disagree 0.000000 -10.000000 -10.000000
37 Question 8 0 Neither agree nor disagree 0.000000 0.000000 0.000000
36 Question 8 0 Disagree 0.000000 0.000000 0.000000
35 Question 8 0 Strongly disagree 0.000000 0.000000 0.000000
34 Question 7 0 Strongly agree 0.000000 90.000000 90.000000
31 Question 7 0 Disagree 0.000000 -10.000000 -10.000000
38 Question 8 0 Agree 0.000000 0.000000 0.000000
7 Question 2 0 Neither agree nor disagree 0.000000 0.000000 0.000000
15 Question 4 0 Strongly disagree 0.000000 -15.625000 -15.625000
20 Question 5 0 Strongly disagree 0.000000 -10.416667 -10.416667
11 Question 3 0 Disagree 0.000000 -10.000000 -10.000000
17 Question 4 1 Neither agree nor disagree 6.250000 3.125000 -3.125000
32 Question 7 1 Neither agree nor disagree 20.000000 10.000000 -10.000000
25 Question 6 1 Strongly disagree 6.250000 -12.500000 -18.750000
26 Question 6 1 Disagree 6.250000 -6.250000 -12.500000
21 Question 5 1 Disagree 4.166667 -6.250000 -10.416667
39 Question 8 2 Strongly agree 100.000000 100.000000 0.000000
14 Question 3 2 Strongly agree 20.000000 70.000000 50.000000
27 Question 6 2 Neither agree nor disagree 12.500000 6.250000 -6.250000
12 Question 3 2 Neither agree nor disagree 20.000000 10.000000 -10.000000
10 Question 3 2 Strongly disagree 20.000000 -10.000000 -30.000000
6 Question 2 2 Disagree 9.090909 0.000000 -9.090909
5 Question 2 2 Strongly disagree 9.090909 -9.090909 -18.181818
16 Question 4 2 Disagree 12.500000 -3.125000 -15.625000
22 Question 5 3 Neither agree nor disagree 12.500000 6.250000 -6.250000
29 Question 6 3 Strongly agree 18.750000 81.250000 62.500000
13 Question 3 4 Agree 40.000000 50.000000 10.000000
33 Question 7 4 Agree 80.000000 90.000000 10.000000
24 Question 5 4 Strongly agree 16.666667 89.583333 72.916667
19 Question 4 6 Strongly agree 37.500000 84.375000 46.875000
18 Question 4 7 Agree 43.750000 46.875000 3.125000
8 Question 2 7 Agree 31.818182 31.818182 0.000000
28 Question 6 9 Agree 56.250000 62.500000 6.250000
9 Question 2 11 Strongly agree 50.000000 81.818182 31.818182
23 Question 5 16 Agree 66.666667 72.916667 6.250000
0 Question 1 24 Strongly disagree 0.746501 -18.382582 -19.129082
1 Question 1 294 Disagree 9.144635 -9.237947 -18.382582
4 Question 1 376 Strongly agree 11.695179 80.870918 69.175739
2 Question 1 594 Neither agree nor disagree 18.475894 9.237947 -9.237947
3 Question 1 1927 Agree 59.937792 69.175739 9.237947
Now 30 is the index of the first row and 3 is the index of last row. If I print values[3] it prints the last value, not the firstvalues = source["value"] print(values[3])
Output:1927
Notice that it prints the value from the last row, not the 4th row.