Posts: 21
Threads: 5
Joined: Jul 2023
Hi,
I am new here, so please bear with me as I am beginner.
There is a code below, in which there is a function called compute_percentages.
I would like to ask why perc[-2] for example or perc[-1] is working perfectly inside that function, but when I have done:
perc = (source["value"] / source["value"].sum()) * 100 this:
perc[-2] throws an error about: " ValueError: -2 is not in range".
I would be very grateful for ideas, thank you.
import pandas as pd
import altair as alt
source = pd.DataFrame([
{
"question": "Question 1",
"type": "Strongly disagree",
"value": 24,
},
{
"question": "Question 1",
"type": "Disagree",
"value": 294,
},
{
"question": "Question 1",
"type": "Neither agree nor disagree",
"value": 594,
},
{
"question": "Question 1",
"type": "Agree",
"value": 1927,
},
{
"question": "Question 1",
"type": "Strongly agree",
"value": 376,
},
{
"question": "Question 2",
"type": "Strongly disagree",
"value": 2,
},
{
"question": "Question 2",
"type": "Disagree",
"value": 2,
},
{
"question": "Question 2",
"type": "Neither agree nor disagree",
"value": 0,
},
{
"question": "Question 2",
"type": "Agree",
"value": 7,
},
{
"question": "Question 2",
"type": "Strongly agree",
"value": 11,
},
{
"question": "Question 3",
"type": "Strongly disagree",
"value": 2,
},
{
"question": "Question 3",
"type": "Disagree",
"value": 0,
},
{
"question": "Question 3",
"type": "Neither agree nor disagree",
"value": 2,
},
{
"question": "Question 3",
"type": "Agree",
"value": 4,
},
{
"question": "Question 3",
"type": "Strongly agree",
"value": 2,
},
{
"question": "Question 4",
"type": "Strongly disagree",
"value": 0,
},
{
"question": "Question 4",
"type": "Disagree",
"value": 2,
},
{
"question": "Question 4",
"type": "Neither agree nor disagree",
"value": 1,
},
{
"question": "Question 4",
"type": "Agree",
"value": 7,
},
{
"question": "Question 4",
"type": "Strongly agree",
"value": 6,
},
{
"question": "Question 5",
"type": "Strongly disagree",
"value": 0,
},
{
"question": "Question 5",
"type": "Disagree",
"value": 1,
},
{
"question": "Question 5",
"type": "Neither agree nor disagree",
"value": 3,
},
{
"question": "Question 5",
"type": "Agree",
"value": 16,
},
{
"question": "Question 5",
"type": "Strongly agree",
"value": 4,
},
{
"question": "Question 6",
"type": "Strongly disagree",
"value": 1,
},
{
"question": "Question 6",
"type": "Disagree",
"value": 1,
},
{
"question": "Question 6",
"type": "Neither agree nor disagree",
"value": 2,
},
{
"question": "Question 6",
"type": "Agree",
"value": 9,
},
{
"question": "Question 6",
"type": "Strongly agree",
"value": 3,
},
{
"question": "Question 7",
"type": "Strongly disagree",
"value": 0,
},
{
"question": "Question 7",
"type": "Disagree",
"value": 0,
},
{
"question": "Question 7",
"type": "Neither agree nor disagree",
"value": 1,
},
{
"question": "Question 7",
"type": "Agree",
"value": 4,
},
{
"question": "Question 7",
"type": "Strongly agree",
"value": 0,
},
{
"question": "Question 8",
"type": "Strongly disagree",
"value": 0,
},
{
"question": "Question 8",
"type": "Disagree",
"value": 0,
},
{
"question": "Question 8",
"type": "Neither agree nor disagree",
"value": 0,
},
{
"question": "Question 8",
"type": "Agree",
"value": 0,
},
{
"question": "Question 8",
"type": "Strongly agree",
"value": 2,
}
])
# Add type_code that we can sort by
source["type_code"] = source.type.map({
"Strongly disagree": -2,
"Disagree": -1,
"Neither agree nor disagree": 0,
"Agree": 1,
"Strongly agree": 2
})
source
def compute_percentages(df):
# Set type_code as index and sort
df = df.set_index("type_code").sort_index()
# Compute percentage of value with question group
perc = (df["value"] / df["value"].sum()) * 100
df["percentage"] = perc
# Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)
# Compute percentage start by subtracting percent
df["percentage_start"] = df["percentage_end"] - perc
return df
source = (
source
.groupby("question", group_keys=True)
.apply(compute_percentages)
.reset_index(drop=True)
)
Posts: 12,030
Threads: 485
Joined: Sep 2016
try:
print(perc)
index of -2 is last element -1 so there must be at least two elements
the error message is telling you that the perc does not contain enough elements to access perc[-2]
example:
>>> perc = []
>>> perc
[]
>>> perc.append(123)
>>> perc
[123]
>>> perc[-2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> perc.append(456)
>>> perc
[123, 456]
>>> perc[-2]
123
>>>
Posts: 6,792
Threads: 20
Joined: Feb 2020
Jul-12-2023, 05:05 PM
(This post was last modified: Jul-12-2023, 05:20 PM by deanhystad.)
Looking at your code, perc is Pandas series, not a list. Indexing for a series does not work like indexing in a list. Indexing for a series works more like indexing in a dictionary. The index value is a key, not a position. The error message would be
Error: KeyError: -2
This error will occur if you have a question that contains no "Strongly Disagree" type.
But I don't get an error when I run your code. I get this.
Output: question type value percentage percentage_end percentage_start
0 Question 1 Strongly disagree 24 0.746501 -18.382582 -19.129082
1 Question 1 Disagree 294 9.144635 -9.237947 -18.382582
2 Question 1 Neither agree nor disagree 594 18.475894 9.237947 -9.237947
3 Question 1 Agree 1927 59.937792 69.175739 9.237947
4 Question 1 Strongly agree 376 11.695179 80.870918 69.175739
5 Question 2 Strongly disagree 2 9.090909 -9.090909 -18.181818
6 Question 2 Disagree 2 9.090909 0.000000 -9.090909
7 Question 2 Neither agree nor disagree 0 0.000000 0.000000 0.000000
8 Question 2 Agree 7 31.818182 31.818182 0.000000
9 Question 2 Strongly agree 11 50.000000 81.818182 31.818182
10 Question 3 Strongly disagree 2 20.000000 -10.000000 -30.000000
11 Question 3 Disagree 0 0.000000 -10.000000 -10.000000
12 Question 3 Neither agree nor disagree 2 20.000000 10.000000 -10.000000
13 Question 3 Agree 4 40.000000 50.000000 10.000000
14 Question 3 Strongly agree 2 20.000000 70.000000 50.000000
15 Question 4 Strongly disagree 0 0.000000 -15.625000 -15.625000
16 Question 4 Disagree 2 12.500000 -3.125000 -15.625000
17 Question 4 Neither agree nor disagree 1 6.250000 3.125000 -3.125000
18 Question 4 Agree 7 43.750000 46.875000 3.125000
19 Question 4 Strongly agree 6 37.500000 84.375000 46.875000
20 Question 5 Strongly disagree 0 0.000000 -10.416667 -10.416667
21 Question 5 Disagree 1 4.166667 -6.250000 -10.416667
22 Question 5 Neither agree nor disagree 3 12.500000 6.250000 -6.250000
23 Question 5 Agree 16 66.666667 72.916667 6.250000
24 Question 5 Strongly agree 4 16.666667 89.583333 72.916667
25 Question 6 Strongly disagree 1 6.250000 -12.500000 -18.750000
26 Question 6 Disagree 1 6.250000 -6.250000 -12.500000
27 Question 6 Neither agree nor disagree 2 12.500000 6.250000 -6.250000
28 Question 6 Agree 9 56.250000 62.500000 6.250000
29 Question 6 Strongly agree 3 18.750000 81.250000 62.500000
30 Question 7 Strongly disagree 0 0.000000 -10.000000 -10.000000
31 Question 7 Disagree 0 0.000000 -10.000000 -10.000000
32 Question 7 Neither agree nor disagree 1 20.000000 10.000000 -10.000000
33 Question 7 Agree 4 80.000000 90.000000 10.000000
34 Question 7 Strongly agree 0 0.000000 90.000000 90.000000
35 Question 8 Strongly disagree 0 0.000000 0.000000 0.000000
36 Question 8 Disagree 0 0.000000 0.000000 0.000000
37 Question 8 Neither agree nor disagree 0 0.000000 0.000000 0.000000
38 Question 8 Agree 0 0.000000 0.000000 0.000000
39 Question 8 Strongly agree 2 100.000000 100.000000 0.000000
Are you sure the error you are getting is associated with the code in your post?
In the future please post the entire error message, including the trace.
Posts: 21
Threads: 5
Joined: Jul 2023
In my case/code:
len(perc) gives 40 elements, so I want to get second from the end which would be value of 0.000000.
But when I do perc[-2] it errors:
Error: ValueError: -2 is not in range
What am I missing ?
Posts: 21
Threads: 5
Joined: Jul 2023
(Jul-12-2023, 05:05 PM)deanhystad Wrote: Are you sure the error you are getting is associated with the code in your post?
I have included all code and you are right about the error, but my question is, why is this code working inside a function compute_percentages but separately, outside of it, it is not ?
Posts: 6,792
Threads: 20
Joined: Feb 2020
Jul-12-2023, 05:36 PM
(This post was last modified: Jul-12-2023, 05:39 PM by deanhystad.)
I don't understand your question. Inside the function perc is a series. It is going to look like this:
type_code
-2 0.746501 <- This is perc[-2]
-1 9.144635
0 18.475894
1 59.937792 <- This is not perc[-2]
2 11.695179 It makes one of these for each group (question).
perc does not exist outside the function. You cannot call the function unless you make the groups (you will have duplicate labels in the row index).
Can you post the code that makes a list named "perc" that contains 40 elements. is it something like this?
perc = source["percentage"]
print(perc[-2]) This would raise a key error because the row indices are 0, 1...39. -2 is not a valid index.
If you want a list of the values, ask for that.
perc = source["percentage"].values # Returns a list of values from the "percentage" column.
print(perc[-2])
Posts: 21
Threads: 5
Joined: Jul 2023
(Jul-12-2023, 05:28 PM)deanhystad Wrote: Can you post the code that makes a list named "perc" that contains 40 elements.
I have posted it previously, but here you are:
perc = (source["value"] / source["value"].sum()) * 100 perc[-2]
I tried to change perc to dataframe but could not get it to work. I would like to admit that I am learning Python, so sometimes
even basic subjects are a struggle to me. I apologize for those basic questions. What I want to understand is why this perc[-2] is working inside a function and why does it error when is written separately, meaning not inside a function.
The code I provided in my first post works perfectly, I just want to understand what is happening in it and why I got those errors when I started to experiment with that code.
I hope this clarifies it a bit.
Posts: 6,792
Threads: 20
Joined: Feb 2020
Jul-12-2023, 05:57 PM
(This post was last modified: Jul-12-2023, 05:57 PM by deanhystad.)
perc is a series, essentially a single column dataframe. Indexing for a series uses keys, not positional indexing. If you want to do position indexing, get the values. That will return a numpy array.
perc.values[-2] When you have questions like this, try printing the thing, or the type of the thing. Printing perc would show you why you cannot do perc[-2].
print(perc) Output: 0 0.725076
1 8.882175
2 17.945619
3 58.217523
4 11.359517
5 0.060423
6 0.060423
7 0.000000
8 0.211480
9 0.332326
10 0.060423
11 0.000000
12 0.060423
13 0.120846
14 0.060423
15 0.000000
16 0.060423
17 0.030211
18 0.211480
19 0.181269
20 0.000000
21 0.030211
22 0.090634
23 0.483384
24 0.120846
25 0.030211
26 0.030211
27 0.060423
28 0.271903
29 0.090634
30 0.000000
31 0.000000
32 0.030211
33 0.120846
34 0.000000
35 0.000000
36 0.000000
37 0.000000
38 0.000000
39 0.060423
Name: value, dtype: float64
Notice the row indices does not include -2.
Posts: 21
Threads: 5
Joined: Jul 2023
Ok, Thank you very much for your kind explanations,
Why in a function compute_percentages it was used as:
# Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2) and was not written as:
perc.values[-2] Inside that function created perc is of class: "pandas.core.series.Series", isn't it ?
I included that function below:
def compute_percentages(df):
# Set type_code as index and sort
df = df.set_index("type_code").sort_index()
# Compute percentage of value with question group
perc = (df["value"] / df["value"].sum()) * 100
df["percentage"] = perc
# Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)
# Compute percentage start by subtracting percent
df["percentage_start"] = df["percentage_end"] - perc
return df
Posts: 6,792
Threads: 20
Joined: Feb 2020
Jul-12-2023, 07:17 PM
(This post was last modified: Jul-12-2023, 07:18 PM by deanhystad.)
Maybe this will make the indexing clearer.
I made a small change to your code. Instead of computing a numeric type_code I left type_code as a string ("Strongly disagree"...). Now when compute_percentages() reindexes the dataframe to use the "type_code", the row indices are words, not ints. This required a change to the "percentage_end" calculation because there is no perc[-2], perc[-1], or perc[0]. These are now perc["Strongly disagree"], perc["Disagree"] and perc["Neither agree nor disagree"].
source["type_code"] = source["type"] # Changed so type_code is words, not -2, -1, 0, 1, 2
def compute_percentages(df):
# Set type_code as index and sort
df = df.set_index("type")
# Compute percentage of value with question group
perc = (df["value"] / df["value"].sum()) * 100
df["percentage"] = perc
# Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
# Notice that we have to use words to index perc, because the row indices are words, not numbers.
df["percentage_end"] = perc.cumsum() - (
perc["Strongly disagree"]
+ perc["Disagree"]
+ perc["Neither agree nor disagree"] / 2
)
# Compute percentage start by subtracting percent
df["percentage_start"] = df["percentage_end"] - perc
return df When computing perc outside the dataframe, you still created a series. The row indices just happened to be numbers from 0 to 39, but they were still keys, not positions in some array or list.
One more attempt to make this clear. Here I sort the source dataframe by the "value" column.
source = source.sort_values("value")
print(source) Output: question value type_code percentage percentage_end percentage_start
30 Question 7 0 Strongly disagree 0.000000 -10.000000 -10.000000
37 Question 8 0 Neither agree nor disagree 0.000000 0.000000 0.000000
36 Question 8 0 Disagree 0.000000 0.000000 0.000000
35 Question 8 0 Strongly disagree 0.000000 0.000000 0.000000
34 Question 7 0 Strongly agree 0.000000 90.000000 90.000000
31 Question 7 0 Disagree 0.000000 -10.000000 -10.000000
38 Question 8 0 Agree 0.000000 0.000000 0.000000
7 Question 2 0 Neither agree nor disagree 0.000000 0.000000 0.000000
15 Question 4 0 Strongly disagree 0.000000 -15.625000 -15.625000
20 Question 5 0 Strongly disagree 0.000000 -10.416667 -10.416667
11 Question 3 0 Disagree 0.000000 -10.000000 -10.000000
17 Question 4 1 Neither agree nor disagree 6.250000 3.125000 -3.125000
32 Question 7 1 Neither agree nor disagree 20.000000 10.000000 -10.000000
25 Question 6 1 Strongly disagree 6.250000 -12.500000 -18.750000
26 Question 6 1 Disagree 6.250000 -6.250000 -12.500000
21 Question 5 1 Disagree 4.166667 -6.250000 -10.416667
39 Question 8 2 Strongly agree 100.000000 100.000000 0.000000
14 Question 3 2 Strongly agree 20.000000 70.000000 50.000000
27 Question 6 2 Neither agree nor disagree 12.500000 6.250000 -6.250000
12 Question 3 2 Neither agree nor disagree 20.000000 10.000000 -10.000000
10 Question 3 2 Strongly disagree 20.000000 -10.000000 -30.000000
6 Question 2 2 Disagree 9.090909 0.000000 -9.090909
5 Question 2 2 Strongly disagree 9.090909 -9.090909 -18.181818
16 Question 4 2 Disagree 12.500000 -3.125000 -15.625000
22 Question 5 3 Neither agree nor disagree 12.500000 6.250000 -6.250000
29 Question 6 3 Strongly agree 18.750000 81.250000 62.500000
13 Question 3 4 Agree 40.000000 50.000000 10.000000
33 Question 7 4 Agree 80.000000 90.000000 10.000000
24 Question 5 4 Strongly agree 16.666667 89.583333 72.916667
19 Question 4 6 Strongly agree 37.500000 84.375000 46.875000
18 Question 4 7 Agree 43.750000 46.875000 3.125000
8 Question 2 7 Agree 31.818182 31.818182 0.000000
28 Question 6 9 Agree 56.250000 62.500000 6.250000
9 Question 2 11 Strongly agree 50.000000 81.818182 31.818182
23 Question 5 16 Agree 66.666667 72.916667 6.250000
0 Question 1 24 Strongly disagree 0.746501 -18.382582 -19.129082
1 Question 1 294 Disagree 9.144635 -9.237947 -18.382582
4 Question 1 376 Strongly agree 11.695179 80.870918 69.175739
2 Question 1 594 Neither agree nor disagree 18.475894 9.237947 -9.237947
3 Question 1 1927 Agree 59.937792 69.175739 9.237947
Now 30 is the index of the first row and 3 is the index of last row. If I print values[3] it prints the last value, not the first
values = source["value"]
print(values[3]) Output: 1927
Notice that it prints the value from the last row, not the 4th row.
Andrzej_Andrzej likes this post
|