Python Forum
Negative indexing/selecting working and not working
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Negative indexing/selecting working and not working
#1
Hi,
I am new here, so please bear with me as I am beginner.
There is a code below, in which there is a function called compute_percentages.
I would like to ask why perc[-2] for example or perc[-1] is working perfectly inside that function, but when I have done:
perc = (source["value"] / source["value"].sum()) * 100
this:
perc[-2]
throws an error about: " ValueError: -2 is not in range".
I would be very grateful for ideas, thank you.

import pandas as pd
import altair as alt

source = pd.DataFrame([
      {
        "question": "Question 1",
        "type": "Strongly disagree",
        "value": 24,
      },
      {
        "question": "Question 1",
        "type": "Disagree",
        "value": 294,
      },
      {
        "question": "Question 1",
        "type": "Neither agree nor disagree",
        "value": 594,
      },
      {
        "question": "Question 1",
        "type": "Agree",
        "value": 1927,
      },
      {
        "question": "Question 1",
        "type": "Strongly agree",
        "value": 376,
      },
      {
        "question": "Question 2",
        "type": "Strongly disagree",
        "value": 2,
      },
      {
        "question": "Question 2",
        "type": "Disagree",
        "value": 2,
      },
      {
        "question": "Question 2",
        "type": "Neither agree nor disagree",
        "value": 0,
      },
      {
        "question": "Question 2",
        "type": "Agree",
        "value": 7,
      },
      {
        "question": "Question 2",
        "type": "Strongly agree",
        "value": 11,
      },
      {
        "question": "Question 3",
        "type": "Strongly disagree",
        "value": 2,
      },
      {
        "question": "Question 3",
        "type": "Disagree",
        "value": 0,
      },
      {
        "question": "Question 3",
        "type": "Neither agree nor disagree",
        "value": 2,
      },
      {
        "question": "Question 3",
        "type": "Agree",
        "value": 4,
      },
      {
        "question": "Question 3",
        "type": "Strongly agree",
        "value": 2,
      },

      {
        "question": "Question 4",
        "type": "Strongly disagree",
        "value": 0,
      },
      {
        "question": "Question 4",
        "type": "Disagree",
        "value": 2,
      },
      {
        "question": "Question 4",
        "type": "Neither agree nor disagree",
        "value": 1,
      },
      {
        "question": "Question 4",
        "type": "Agree",
        "value": 7,
      },
      {
        "question": "Question 4",
        "type": "Strongly agree",
        "value": 6,
      },

      {
        "question": "Question 5",
        "type": "Strongly disagree",
        "value": 0,
      },
      {
        "question": "Question 5",
        "type": "Disagree",
        "value": 1,
      },
      {
        "question": "Question 5",
        "type": "Neither agree nor disagree",
        "value": 3,
      },
      {
        "question": "Question 5",
        "type": "Agree",
        "value": 16,
      },
      {
        "question": "Question 5",
        "type": "Strongly agree",
        "value": 4,
      },

      {
        "question": "Question 6",
        "type": "Strongly disagree",
        "value": 1,
      },
      {
        "question": "Question 6",
        "type": "Disagree",
        "value": 1,
      },
      {
        "question": "Question 6",
        "type": "Neither agree nor disagree",
        "value": 2,
      },
      {
        "question": "Question 6",
        "type": "Agree",
        "value": 9,
      },
      {
        "question": "Question 6",
        "type": "Strongly agree",
        "value": 3,
      },

      {
        "question": "Question 7",
        "type": "Strongly disagree",
        "value": 0,
      },
      {
        "question": "Question 7",
        "type": "Disagree",
        "value": 0,
      },
      {
        "question": "Question 7",
        "type": "Neither agree nor disagree",
        "value": 1,
      },
      {
        "question": "Question 7",
        "type": "Agree",
        "value": 4,
      },
      {
        "question": "Question 7",
        "type": "Strongly agree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Strongly disagree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Disagree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Neither agree nor disagree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Agree",
        "value": 0,
      },
      {
        "question": "Question 8",
        "type": "Strongly agree",
        "value": 2,
      }
])

# Add type_code that we can sort by
source["type_code"] = source.type.map({
    "Strongly disagree": -2, 
    "Disagree": -1, 
    "Neither agree nor disagree": 0,
    "Agree": 1,
    "Strongly agree": 2
})
source

def compute_percentages(df):
    # Set type_code as index and sort
    df = df.set_index("type_code").sort_index()
    
    # Compute percentage of value with question group
    perc = (df["value"] / df["value"].sum()) * 100
    df["percentage"] = perc

    # Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
    df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)
    
    # Compute percentage start by subtracting percent
    df["percentage_start"] = df["percentage_end"] - perc

    return df

source = (
    source
    .groupby("question", group_keys=True)
    .apply(compute_percentages)
    .reset_index(drop=True)
)
Reply
#2
try:
print(perc)
index of -2 is last element -1 so there must be at least two elements

the error message is telling you that the perc does not contain enough elements to access perc[-2]
example:
>>> perc = []
>>> perc
[]
>>> perc.append(123)
>>> perc
[123]
>>> perc[-2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> perc.append(456)
>>> perc
[123, 456]
>>> perc[-2]
123
>>>
Reply
#3
Looking at your code, perc is Pandas series, not a list. Indexing for a series does not work like indexing in a list. Indexing for a series works more like indexing in a dictionary. The index value is a key, not a position. The error message would be
Error:
KeyError: -2
This error will occur if you have a question that contains no "Strongly Disagree" type.

But I don't get an error when I run your code. I get this.
Output:
question type value percentage percentage_end percentage_start 0 Question 1 Strongly disagree 24 0.746501 -18.382582 -19.129082 1 Question 1 Disagree 294 9.144635 -9.237947 -18.382582 2 Question 1 Neither agree nor disagree 594 18.475894 9.237947 -9.237947 3 Question 1 Agree 1927 59.937792 69.175739 9.237947 4 Question 1 Strongly agree 376 11.695179 80.870918 69.175739 5 Question 2 Strongly disagree 2 9.090909 -9.090909 -18.181818 6 Question 2 Disagree 2 9.090909 0.000000 -9.090909 7 Question 2 Neither agree nor disagree 0 0.000000 0.000000 0.000000 8 Question 2 Agree 7 31.818182 31.818182 0.000000 9 Question 2 Strongly agree 11 50.000000 81.818182 31.818182 10 Question 3 Strongly disagree 2 20.000000 -10.000000 -30.000000 11 Question 3 Disagree 0 0.000000 -10.000000 -10.000000 12 Question 3 Neither agree nor disagree 2 20.000000 10.000000 -10.000000 13 Question 3 Agree 4 40.000000 50.000000 10.000000 14 Question 3 Strongly agree 2 20.000000 70.000000 50.000000 15 Question 4 Strongly disagree 0 0.000000 -15.625000 -15.625000 16 Question 4 Disagree 2 12.500000 -3.125000 -15.625000 17 Question 4 Neither agree nor disagree 1 6.250000 3.125000 -3.125000 18 Question 4 Agree 7 43.750000 46.875000 3.125000 19 Question 4 Strongly agree 6 37.500000 84.375000 46.875000 20 Question 5 Strongly disagree 0 0.000000 -10.416667 -10.416667 21 Question 5 Disagree 1 4.166667 -6.250000 -10.416667 22 Question 5 Neither agree nor disagree 3 12.500000 6.250000 -6.250000 23 Question 5 Agree 16 66.666667 72.916667 6.250000 24 Question 5 Strongly agree 4 16.666667 89.583333 72.916667 25 Question 6 Strongly disagree 1 6.250000 -12.500000 -18.750000 26 Question 6 Disagree 1 6.250000 -6.250000 -12.500000 27 Question 6 Neither agree nor disagree 2 12.500000 6.250000 -6.250000 28 Question 6 Agree 9 56.250000 62.500000 6.250000 29 Question 6 Strongly agree 3 18.750000 81.250000 62.500000 30 Question 7 Strongly disagree 0 0.000000 -10.000000 -10.000000 31 Question 7 Disagree 0 0.000000 -10.000000 -10.000000 32 Question 7 Neither agree nor disagree 1 20.000000 10.000000 -10.000000 33 Question 7 Agree 4 80.000000 90.000000 10.000000 34 Question 7 Strongly agree 0 0.000000 90.000000 90.000000 35 Question 8 Strongly disagree 0 0.000000 0.000000 0.000000 36 Question 8 Disagree 0 0.000000 0.000000 0.000000 37 Question 8 Neither agree nor disagree 0 0.000000 0.000000 0.000000 38 Question 8 Agree 0 0.000000 0.000000 0.000000 39 Question 8 Strongly agree 2 100.000000 100.000000 0.000000
Are you sure the error you are getting is associated with the code in your post?

In the future please post the entire error message, including the trace.
Reply
#4
In my case/code:
len(perc)
gives 40 elements, so I want to get second from the end which would be value of 0.000000.
But when I do perc[-2] it errors:
Error:
ValueError: -2 is not in range
What am I missing ?
Reply
#5
(Jul-12-2023, 05:05 PM)deanhystad Wrote: Are you sure the error you are getting is associated with the code in your post?

I have included all code and you are right about the error, but my question is, why is this code working inside a function compute_percentages but separately, outside of it, it is not ?
Reply
#6
I don't understand your question. Inside the function perc is a series. It is going to look like this:
type_code
-2     0.746501  <- This is perc[-2]
-1     9.144635
 0    18.475894
 1    59.937792  <- This is not perc[-2]
 2    11.695179
It makes one of these for each group (question).

perc does not exist outside the function. You cannot call the function unless you make the groups (you will have duplicate labels in the row index).

Can you post the code that makes a list named "perc" that contains 40 elements. is it something like this?
perc = source["percentage"]
print(perc[-2])
This would raise a key error because the row indices are 0, 1...39. -2 is not a valid index.
If you want a list of the values, ask for that.
perc = source["percentage"].values  # Returns a list of values from the "percentage" column.
print(perc[-2])
Reply
#7
(Jul-12-2023, 05:28 PM)deanhystad Wrote: Can you post the code that makes a list named "perc" that contains 40 elements.

I have posted it previously, but here you are:
perc = (source["value"] / source["value"].sum()) * 100
perc[-2]

I tried to change perc to dataframe but could not get it to work. I would like to admit that I am learning Python, so sometimes
even basic subjects are a struggle to me. I apologize for those basic questions. What I want to understand is why this perc[-2] is working inside a function and why does it error when is written separately, meaning not inside a function.
The code I provided in my first post works perfectly, I just want to understand what is happening in it and why I got those errors when I started to experiment with that code.
I hope this clarifies it a bit.
Reply
#8
perc is a series, essentially a single column dataframe. Indexing for a series uses keys, not positional indexing. If you want to do position indexing, get the values. That will return a numpy array.
perc.values[-2]
When you have questions like this, try printing the thing, or the type of the thing. Printing perc would show you why you cannot do perc[-2].
print(perc)
Output:
0 0.725076 1 8.882175 2 17.945619 3 58.217523 4 11.359517 5 0.060423 6 0.060423 7 0.000000 8 0.211480 9 0.332326 10 0.060423 11 0.000000 12 0.060423 13 0.120846 14 0.060423 15 0.000000 16 0.060423 17 0.030211 18 0.211480 19 0.181269 20 0.000000 21 0.030211 22 0.090634 23 0.483384 24 0.120846 25 0.030211 26 0.030211 27 0.060423 28 0.271903 29 0.090634 30 0.000000 31 0.000000 32 0.030211 33 0.120846 34 0.000000 35 0.000000 36 0.000000 37 0.000000 38 0.000000 39 0.060423 Name: value, dtype: float64
Notice the row indices does not include -2.
Reply
#9
Ok, Thank you very much for your kind explanations,

Why in a function compute_percentages it was used as:

# Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
    df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)
and was not written as:
perc.values[-2]
Inside that function created perc is of class: "pandas.core.series.Series", isn't it ?

I included that function below:

def compute_percentages(df):
    # Set type_code as index and sort
    df = df.set_index("type_code").sort_index()
    
    # Compute percentage of value with question group
    perc = (df["value"] / df["value"].sum()) * 100
    df["percentage"] = perc

    # Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
    df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)
    
    # Compute percentage start by subtracting percent
    df["percentage_start"] = df["percentage_end"] - perc

    return df
Reply
#10
Maybe this will make the indexing clearer.

I made a small change to your code. Instead of computing a numeric type_code I left type_code as a string ("Strongly disagree"...). Now when compute_percentages() reindexes the dataframe to use the "type_code", the row indices are words, not ints. This required a change to the "percentage_end" calculation because there is no perc[-2], perc[-1], or perc[0]. These are now perc["Strongly disagree"], perc["Disagree"] and perc["Neither agree nor disagree"].
source["type_code"] = source["type"]  # Changed so type_code is words, not -2, -1, 0, 1, 2

def compute_percentages(df):
    # Set type_code as index and sort
    df = df.set_index("type")

    # Compute percentage of value with question group
    perc = (df["value"] / df["value"].sum()) * 100
    df["percentage"] = perc

    # Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
    # Notice that we have to use words to index perc, because the row indices are words, not numbers.
    df["percentage_end"] = perc.cumsum() - (
        perc["Strongly disagree"]
        + perc["Disagree"]
        + perc["Neither agree nor disagree"] / 2
    )

    # Compute percentage start by subtracting percent
    df["percentage_start"] = df["percentage_end"] - perc

    return df
When computing perc outside the dataframe, you still created a series. The row indices just happened to be numbers from 0 to 39, but they were still keys, not positions in some array or list.

One more attempt to make this clear. Here I sort the source dataframe by the "value" column.
source = source.sort_values("value")
print(source)
Output:
question value type_code percentage percentage_end percentage_start 30 Question 7 0 Strongly disagree 0.000000 -10.000000 -10.000000 37 Question 8 0 Neither agree nor disagree 0.000000 0.000000 0.000000 36 Question 8 0 Disagree 0.000000 0.000000 0.000000 35 Question 8 0 Strongly disagree 0.000000 0.000000 0.000000 34 Question 7 0 Strongly agree 0.000000 90.000000 90.000000 31 Question 7 0 Disagree 0.000000 -10.000000 -10.000000 38 Question 8 0 Agree 0.000000 0.000000 0.000000 7 Question 2 0 Neither agree nor disagree 0.000000 0.000000 0.000000 15 Question 4 0 Strongly disagree 0.000000 -15.625000 -15.625000 20 Question 5 0 Strongly disagree 0.000000 -10.416667 -10.416667 11 Question 3 0 Disagree 0.000000 -10.000000 -10.000000 17 Question 4 1 Neither agree nor disagree 6.250000 3.125000 -3.125000 32 Question 7 1 Neither agree nor disagree 20.000000 10.000000 -10.000000 25 Question 6 1 Strongly disagree 6.250000 -12.500000 -18.750000 26 Question 6 1 Disagree 6.250000 -6.250000 -12.500000 21 Question 5 1 Disagree 4.166667 -6.250000 -10.416667 39 Question 8 2 Strongly agree 100.000000 100.000000 0.000000 14 Question 3 2 Strongly agree 20.000000 70.000000 50.000000 27 Question 6 2 Neither agree nor disagree 12.500000 6.250000 -6.250000 12 Question 3 2 Neither agree nor disagree 20.000000 10.000000 -10.000000 10 Question 3 2 Strongly disagree 20.000000 -10.000000 -30.000000 6 Question 2 2 Disagree 9.090909 0.000000 -9.090909 5 Question 2 2 Strongly disagree 9.090909 -9.090909 -18.181818 16 Question 4 2 Disagree 12.500000 -3.125000 -15.625000 22 Question 5 3 Neither agree nor disagree 12.500000 6.250000 -6.250000 29 Question 6 3 Strongly agree 18.750000 81.250000 62.500000 13 Question 3 4 Agree 40.000000 50.000000 10.000000 33 Question 7 4 Agree 80.000000 90.000000 10.000000 24 Question 5 4 Strongly agree 16.666667 89.583333 72.916667 19 Question 4 6 Strongly agree 37.500000 84.375000 46.875000 18 Question 4 7 Agree 43.750000 46.875000 3.125000 8 Question 2 7 Agree 31.818182 31.818182 0.000000 28 Question 6 9 Agree 56.250000 62.500000 6.250000 9 Question 2 11 Strongly agree 50.000000 81.818182 31.818182 23 Question 5 16 Agree 66.666667 72.916667 6.250000 0 Question 1 24 Strongly disagree 0.746501 -18.382582 -19.129082 1 Question 1 294 Disagree 9.144635 -9.237947 -18.382582 4 Question 1 376 Strongly agree 11.695179 80.870918 69.175739 2 Question 1 594 Neither agree nor disagree 18.475894 9.237947 -9.237947 3 Question 1 1927 Agree 59.937792 69.175739 9.237947
Now 30 is the index of the first row and 3 is the index of last row. If I print values[3] it prints the last value, not the first
values = source["value"]
print(values[3])
Output:
1927
Notice that it prints the value from the last row, not the 4th row.
Andrzej_Andrzej likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Gmpy2 Newbie Working on Precision charlesrkiss 6 837 Jun-14-2024, 09:53 AM
Last Post: sunflower36002
  Working with group of lines knob 1 334 May-21-2024, 07:21 AM
Last Post: Gribouillis
  two functions working in a strange way zapad 2 408 May-02-2024, 01:35 PM
Last Post: zapad
  Excel isnt working properly after python function is started IchNar 2 460 May-01-2024, 06:43 PM
Last Post: IchNar
  negative memory usage akbarza 1 396 Apr-27-2024, 08:43 AM
Last Post: Gribouillis
  Python trivial endgame engine is not working as expected max22 0 667 Feb-24-2024, 04:41 PM
Last Post: max22
  File Handling not working properly TheLummen 8 1,160 Feb-17-2024, 07:47 PM
Last Post: TheLummen
  Spyder console zoom in not working? Dionysis 2 613 Feb-06-2024, 03:31 PM
Last Post: paul18fr
  SendKeys not working SanjayGMusafir 4 715 Jan-16-2024, 12:07 PM
Last Post: EdwardMatthew
  Text conversion to lowercase is not working ineuw 3 642 Jan-16-2024, 02:42 AM
Last Post: ineuw

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020