Python Forum

Pages: 1 2 3

Hi,
I am new here, so please bear with me as I am beginner.
There is a code below, in which there is a function called compute_percentages.
I would like to ask why perc[-2] for example or perc[-1] is working perfectly inside that function, but when I have done:

perc = (source["value"] / source["value"].sum()) * 100

this:

perc[-2]

throws an error about: " ValueError: -2 is not in range".
I would be very grateful for ideas, thank you.

import pandas as pd
import altair as alt

source = pd.DataFrame([
      {
        "question": "Question 1",
        "type": "Strongly disagree",
        "value": 24,
      },
      {
        "question": "Question 1",
        "type": "Disagree",
        "value": 294,
      },
      {
        "question": "Question 1",
        "type": "Neither agree nor disagree",
        "value": 594,
      },
      {
        "question": "Question 1",
        "type": "Agree",
        "value": 1927,
      },
      {
        "question": "Question 1",
        "type": "Strongly agree",
        "value": 376,
      },
      {
        "question": "Question 2",
        "type": "Strongly disagree",
        "value": 2,
      },
      {
        "question": "Question 2",
        "type": "Disagree",
        "value": 2,
      },
      {
        "question": "Question 2",
        "type": "Neither agree nor disagree",
        "value": 0,
      },
      {
        "question": "Question 2",
        "type": "Agree",
        "value": 7,
      },
      {
        "question": "Question 2",
        "type": "Strongly agree",
        "value": 11,
      },
      {
        "question": "Question 3",
        "type": "Strongly disagree",
        "value": 2,
      },
      {
        "question": "Question 3",
        "type": "Disagree",
        "value": 0,
      },
      {
        "question": "Question 3",
        "type": "Neither agree nor disagree",
        "value": 2,
      },
      {
        "question": "Question 3",
        "type": "Agree",
        "value": 4,
      },
      {
        "question": "Question 3",
        "type": "Strongly agree",
        "value": 2,
      },

      {
        "question": "Question 4",
        "type": "Strongly disagree",
        "value": 0,
      },
      {
        "question": "Question 4",
        "type": "Disagree",
        "value": 2,
      },
      {
        "question": "Question 4",
        "type": "Neither agree nor disagree",
        "value": 1,
      },
      {
        "question": "Question 4",
        "type": "Agree",
        "value": 7,
      },
      {
        "question": "Question 4",
        "type": "Strongly agree",
        "value": 6,
      },

      {
        "question": "Question 5",
        "type": "Strongly disagree",
        "value": 0,
      },
      {
        "question": "Question 5",
        "type": "Disagree",
        "value": 1,
      },
      {
        "question": "Question 5",
        "type": "Neither agree nor disagree",
        "value": 3,
      },
      {
        "question": "Question 5",
        "type": "Agree",
        "value": 16,
      },
      {
        "question": "Question 5",
        "type": "Strongly agree",
        "value": 4,
      },

      {
        "question": "Question 6",
        "type": "Strongly disagree",
        "value": 1,
      },
      {
        "question": "Question 6",
        "type": "Disagree",
        "value": 1,
      },
      {
        "question": "Question 6",
        "type": "Neither agree nor disagree",
        "value": 2,
      },
      {
        "question": "Question 6",
        "type": "Agree",
        "value": 9,
      },
      {
        "question": "Question 6",
        "type": "Strongly agree",
        "value": 3,
      },

      {
        "question": "Question 7",
        "type": "Strongly disagree",
        "value": 0,
      },
      {
        "question": "Question 7",
        "type": "Disagree",
        "value": 0,
      },
      {
        "question": "Question 7",
        "type": "Neither agree nor disagree",
        "value": 1,
      },
      {
        "question": "Question 7",
        "type": "Agree",
        "value": 4,
      },
      {
        "question": "Question 7",
        "type": "Strongly agree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Strongly disagree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Disagree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Neither agree nor disagree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Agree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Strongly agree",
        "value": 2,
      }
])

# Add type_code that we can sort by
source["type_code"] = source.type.map({
    "Strongly disagree": -2, 
    "Disagree": -1, 
    "Neither agree nor disagree": 0,
    "Agree": 1,
    "Strongly agree": 2
})
source

def compute_percentages(df):
    # Set type_code as index and sort
    df = df.set_index("type_code").sort_index()
    
    # Compute percentage of value with question group
    perc = (df["value"] / df["value"].sum()) * 100
    df["percentage"] = perc

    # Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
    df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)
    
    # Compute percentage start by subtracting percent
    df["percentage_start"] = df["percentage_end"] - perc

    return df

source = (
    source
    .groupby("question", group_keys=True)
    .apply(compute_percentages)
    .reset_index(drop=True)
)

try:
print(perc)
index of -2 is last element -1 so there must be at least two elements

the error message is telling you that the perc does not contain enough elements to access perc[-2]
example:

>>> perc = []
>>> perc
[]
>>> perc.append(123)
>>> perc
[123]
>>> perc[-2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> perc.append(456)
>>> perc
[123, 456]
>>> perc[-2]
123
>>>

Looking at your code, perc is Pandas series, not a list. Indexing for a series does not work like indexing in a list. Indexing for a series works more like indexing in a dictionary. The index value is a key, not a position. The error message would be

Error:
KeyError: -2

This error will occur if you have a question that contains no "Strongly Disagree" type.

But I don't get an error when I run your code. I get this.

Output:      question                        type  value  percentage  percentage_end  percentage_start
0   Question 1           Strongly disagree     24    0.746501      -18.382582        -19.129082
1   Question 1                    Disagree    294    9.144635       -9.237947        -18.382582
2   Question 1  Neither agree nor disagree    594   18.475894        9.237947         -9.237947
3   Question 1                       Agree   1927   59.937792       69.175739          9.237947
4   Question 1              Strongly agree    376   11.695179       80.870918         69.175739
5   Question 2           Strongly disagree      2    9.090909       -9.090909        -18.181818
6   Question 2                    Disagree      2    9.090909        0.000000         -9.090909
7   Question 2  Neither agree nor disagree      0    0.000000        0.000000          0.000000
8   Question 2                       Agree      7   31.818182       31.818182          0.000000
9   Question 2              Strongly agree     11   50.000000       81.818182         31.818182
10  Question 3           Strongly disagree      2   20.000000      -10.000000        -30.000000
11  Question 3                    Disagree      0    0.000000      -10.000000        -10.000000
12  Question 3  Neither agree nor disagree      2   20.000000       10.000000        -10.000000
13  Question 3                       Agree      4   40.000000       50.000000         10.000000
14  Question 3              Strongly agree      2   20.000000       70.000000         50.000000
15  Question 4           Strongly disagree      0    0.000000      -15.625000        -15.625000
16  Question 4                    Disagree      2   12.500000       -3.125000        -15.625000
17  Question 4  Neither agree nor disagree      1    6.250000        3.125000         -3.125000
18  Question 4                       Agree      7   43.750000       46.875000          3.125000
19  Question 4              Strongly agree      6   37.500000       84.375000         46.875000
20  Question 5           Strongly disagree      0    0.000000      -10.416667        -10.416667
21  Question 5                    Disagree      1    4.166667       -6.250000        -10.416667
22  Question 5  Neither agree nor disagree      3   12.500000        6.250000         -6.250000
23  Question 5                       Agree     16   66.666667       72.916667          6.250000
24  Question 5              Strongly agree      4   16.666667       89.583333         72.916667
25  Question 6           Strongly disagree      1    6.250000      -12.500000        -18.750000
26  Question 6                    Disagree      1    6.250000       -6.250000        -12.500000
27  Question 6  Neither agree nor disagree      2   12.500000        6.250000         -6.250000
28  Question 6                       Agree      9   56.250000       62.500000          6.250000
29  Question 6              Strongly agree      3   18.750000       81.250000         62.500000
30  Question 7           Strongly disagree      0    0.000000      -10.000000        -10.000000
31  Question 7                    Disagree      0    0.000000      -10.000000        -10.000000
32  Question 7  Neither agree nor disagree      1   20.000000       10.000000        -10.000000
33  Question 7                       Agree      4   80.000000       90.000000         10.000000
34  Question 7              Strongly agree      0    0.000000       90.000000         90.000000
35  Question 8           Strongly disagree      0    0.000000        0.000000          0.000000
36  Question 8                    Disagree      0    0.000000        0.000000          0.000000
37  Question 8  Neither agree nor disagree      0    0.000000        0.000000          0.000000
38  Question 8                       Agree      0    0.000000        0.000000          0.000000
39  Question 8              Strongly agree      2  100.000000      100.000000          0.000000

Are you sure the error you are getting is associated with the code in your post?

In the future please post the entire error message, including the trace.

In my case/code:

len(perc)

gives 40 elements, so I want to get second from the end which would be value of 0.000000.
But when I do perc[-2] it errors:

Error:
ValueError: -2 is not in range

What am I missing ?

(Jul-12-2023, 05:05 PM)deanhystad Wrote: [ -> ]Are you sure the error you are getting is associated with the code in your post?

I have included all code and you are right about the error, but my question is, why is this code working inside a function compute_percentages but separately, outside of it, it is not ?

I don't understand your question. Inside the function perc is a series. It is going to look like this:

type_code
-2     0.746501  <- This is perc[-2]
-1     9.144635
 0    18.475894
 1    59.937792  <- This is not perc[-2]
 2    11.695179

It makes one of these for each group (question).

perc does not exist outside the function. You cannot call the function unless you make the groups (you will have duplicate labels in the row index).

Can you post the code that makes a list named "perc" that contains 40 elements. is it something like this?

perc = source["percentage"]
print(perc[-2])

This would raise a key error because the row indices are 0, 1...39. -2 is not a valid index.
If you want a list of the values, ask for that.

perc = source["percentage"].values  # Returns a list of values from the "percentage" column.
print(perc[-2])

(Jul-12-2023, 05:28 PM)deanhystad Wrote: [ -> ]Can you post the code that makes a list named "perc" that contains 40 elements.

I have posted it previously, but here you are:

perc = (source["value"] / source["value"].sum()) * 100

perc[-2]

I tried to change perc to dataframe but could not get it to work. I would like to admit that I am learning Python, so sometimes
even basic subjects are a struggle to me. I apologize for those basic questions. What I want to understand is why this perc[-2] is working inside a function and why does it error when is written separately, meaning not inside a function.
The code I provided in my first post works perfectly, I just want to understand what is happening in it and why I got those errors when I started to experiment with that code.
I hope this clarifies it a bit.

perc is a series, essentially a single column dataframe. Indexing for a series uses keys, not positional indexing. If you want to do position indexing, get the values. That will return a numpy array.

perc.values[-2]

When you have questions like this, try printing the thing, or the type of the thing. Printing perc would show you why you cannot do perc[-2].

print(perc)

Output:0      0.725076
1      8.882175
2     17.945619
3     58.217523
4     11.359517
5      0.060423
6      0.060423
7      0.000000
8      0.211480
9      0.332326
10     0.060423
11     0.000000
12     0.060423
13     0.120846
14     0.060423
15     0.000000
16     0.060423
17     0.030211
18     0.211480
19     0.181269
20     0.000000
21     0.030211
22     0.090634
23     0.483384
24     0.120846
25     0.030211
26     0.030211
27     0.060423
28     0.271903
29     0.090634
30     0.000000
31     0.000000
32     0.030211
33     0.120846
34     0.000000
35     0.000000
36     0.000000
37     0.000000
38     0.000000
39     0.060423
Name: value, dtype: float64

Notice the row indices does not include -2.

Ok, Thank you very much for your kind explanations,

Why in a function compute_percentages it was used as:

# Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
    df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)

and was not written as:

perc.values[-2]

Inside that function created perc is of class: "pandas.core.series.Series", isn't it ?

I included that function below:

def compute_percentages(df):
    # Set type_code as index and sort
    df = df.set_index("type_code").sort_index()
    
    # Compute percentage of value with question group
    perc = (df["value"] / df["value"].sum()) * 100
    df["percentage"] = perc

    # Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
    df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)
    
    # Compute percentage start by subtracting percent
    df["percentage_start"] = df["percentage_end"] - perc

    return df

Maybe this will make the indexing clearer.

I made a small change to your code. Instead of computing a numeric type_code I left type_code as a string ("Strongly disagree"...). Now when compute_percentages() reindexes the dataframe to use the "type_code", the row indices are words, not ints. This required a change to the "percentage_end" calculation because there is no perc[-2], perc[-1], or perc[0]. These are now perc["Strongly disagree"], perc["Disagree"] and perc["Neither agree nor disagree"].

source["type_code"] = source["type"]  # Changed so type_code is words, not -2, -1, 0, 1, 2

def compute_percentages(df):
    # Set type_code as index and sort
    df = df.set_index("type")

    # Compute percentage of value with question group
    perc = (df["value"] / df["value"].sum()) * 100
    df["percentage"] = perc

    # Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
    # Notice that we have to use words to index perc, because the row indices are words, not numbers.
    df["percentage_end"] = perc.cumsum() - (
        perc["Strongly disagree"]
        + perc["Disagree"]
        + perc["Neither agree nor disagree"] / 2
    )

    # Compute percentage start by subtracting percent
    df["percentage_start"] = df["percentage_end"] - perc

    return df

When computing perc outside the dataframe, you still created a series. The row indices just happened to be numbers from 0 to 39, but they were still keys, not positions in some array or list.

One more attempt to make this clear. Here I sort the source dataframe by the "value" column.

source = source.sort_values("value")
print(source)

Output:      question  value                   type_code  percentage  percentage_end  percentage_start
30  Question 7      0           Strongly disagree    0.000000      -10.000000        -10.000000
37  Question 8      0  Neither agree nor disagree    0.000000        0.000000          0.000000
36  Question 8      0                    Disagree    0.000000        0.000000          0.000000
35  Question 8      0           Strongly disagree    0.000000        0.000000          0.000000
34  Question 7      0              Strongly agree    0.000000       90.000000         90.000000
31  Question 7      0                    Disagree    0.000000      -10.000000        -10.000000
38  Question 8      0                       Agree    0.000000        0.000000          0.000000
7   Question 2      0  Neither agree nor disagree    0.000000        0.000000          0.000000
15  Question 4      0           Strongly disagree    0.000000      -15.625000        -15.625000
20  Question 5      0           Strongly disagree    0.000000      -10.416667        -10.416667
11  Question 3      0                    Disagree    0.000000      -10.000000        -10.000000
17  Question 4      1  Neither agree nor disagree    6.250000        3.125000         -3.125000
32  Question 7      1  Neither agree nor disagree   20.000000       10.000000        -10.000000
25  Question 6      1           Strongly disagree    6.250000      -12.500000        -18.750000
26  Question 6      1                    Disagree    6.250000       -6.250000        -12.500000
21  Question 5      1                    Disagree    4.166667       -6.250000        -10.416667
39  Question 8      2              Strongly agree  100.000000      100.000000          0.000000
14  Question 3      2              Strongly agree   20.000000       70.000000         50.000000
27  Question 6      2  Neither agree nor disagree   12.500000        6.250000         -6.250000
12  Question 3      2  Neither agree nor disagree   20.000000       10.000000        -10.000000
10  Question 3      2           Strongly disagree   20.000000      -10.000000        -30.000000
6   Question 2      2                    Disagree    9.090909        0.000000         -9.090909
5   Question 2      2           Strongly disagree    9.090909       -9.090909        -18.181818
16  Question 4      2                    Disagree   12.500000       -3.125000        -15.625000
22  Question 5      3  Neither agree nor disagree   12.500000        6.250000         -6.250000
29  Question 6      3              Strongly agree   18.750000       81.250000         62.500000
13  Question 3      4                       Agree   40.000000       50.000000         10.000000
33  Question 7      4                       Agree   80.000000       90.000000         10.000000
24  Question 5      4              Strongly agree   16.666667       89.583333         72.916667
19  Question 4      6              Strongly agree   37.500000       84.375000         46.875000
18  Question 4      7                       Agree   43.750000       46.875000          3.125000
8   Question 2      7                       Agree   31.818182       31.818182          0.000000
28  Question 6      9                       Agree   56.250000       62.500000          6.250000
9   Question 2     11              Strongly agree   50.000000       81.818182         31.818182
23  Question 5     16                       Agree   66.666667       72.916667          6.250000
0   Question 1     24           Strongly disagree    0.746501      -18.382582        -19.129082
1   Question 1    294                    Disagree    9.144635       -9.237947        -18.382582
4   Question 1    376              Strongly agree   11.695179       80.870918         69.175739
2   Question 1    594  Neither agree nor disagree   18.475894        9.237947         -9.237947
3   Question 1   1927                       Agree   59.937792       69.175739          9.237947

Now 30 is the index of the first row and 3 is the index of last row. If I print values[3] it prints the last value, not the first

values = source["value"]
print(values[3])

Output:
1927

Notice that it prints the value from the last row, not the 4th row.

Pages: 1 2 3

Andrzej_Andrzej

Larz60+

deanhystad

Andrzej_Andrzej

Andrzej_Andrzej

deanhystad

Andrzej_Andrzej

deanhystad

Andrzej_Andrzej

deanhystad