Negative indexing/selecting working and not working

Andrzej_Andrzej · Jul-12-2023, 07:25 PM

Thank you again for your patience and clarification.
I will thoroughly study what you have written, because this is a lot to digest for me.
I am very grateful for your replies.
best regards,
Andrzej

**deanhystad** · Jul-12-2023, 08:58 PM

Main thing to remember is that indexing for arrays and lists treat the index like an ordinal number. list[0] is the first element. array[5] is the 6th element. list[-2] is the second to the last element, but will raise an index error if list[] only has 1 element.

indexing in pandas is like indexing in dictionaries. The index in series[5] treats 5 as a key, not an ordinal number. series[5] could be the first item in the series or the last item in the series. There is no way to know because the "5" is not the "position" of the row, it is the "name" of the row, and these row names may have nothing in common with the position of the row. When you ask for series[5], pandas searches "series" for a row index that matches "5" and returns the associated value.

Andrzej_Andrzej · Jul-13-2023, 04:21 AM

Two more questions if I may, please:
1. How to make local (because inside a function) object perc to became a global and not disappear after the function compute_percentages() finishes its work ?
I have done it previously by:

perc = (source["value"] / source["value"].sum()) * 100

2. If I would like to separate that function into three independent but consecutive functions, in order to add following columns in turn: percentage, percentage_end, and percentage_start, meaning doing it step-by-step, and updating "source" dataframe at each and every step, how do I do it please ?

I have done it for percentage column like this:

def compute_percentage(df):
    # Compute percentage of value with question group
    perc = (df["value"] / df["value"].sum()) * 100
    df["percentage"] = perc
    return df

source = (
    source
    .groupby("question", group_keys=True)
    .apply(compute_percentage)
    .reset_index(drop=True)
)

but I can't create remaining two, meaning percentage end and percentage start. Is it possible or all 3 of them must be calculated together like in my original code from my first post ?

**deanhystad** · (This post was last modified: Jul-13-2023, 02:48 PM by deanhystad.)

1. You don't want to do this. compute_percentages() is called multiple times, once for each question group. The perc inside the function changes each time the function is called. If you could see perc outside the function you would only see the perc for the last question.

2. I do not understand your question. compute_percentages() already calculates percentage, percentage_start, and percentage_end. Why are you writing these again?

You should describe what you are trying to achieve, not how you plan to achieve it. As you said, you are a beginner, and maybe your approach to the problem is completely wrong. My early attempts using pandas resulted in bloated, messy code that were replaced with a few simple commands once I had a clue about how pandas works.

Andrzej_Andrzej · Jul-13-2023, 04:13 PM

(Jul-13-2023, 02:48 PM)deanhystad Wrote: 2. I do not understand your question. compute_percentages() already calculates percentage, percentage_start, and percentage_end. Why are you writing these again?

You should describe what you are trying to achieve, not how you plan to achieve it. As you said, you are a beginner, and maybe your approach to the problem is completely wrong. My early attempts using pandas resulted in bloated, messy code that were replaced with a few simple commands once I had a clue about how pandas works.

This is what I would like to achieve:
About 2. This code from my first post works perfectly and all your explanations allow me to start to understand it. I would like to ask if it would be possible to divide that function called calculate_percentages() into 3 separate functions, just for the sake of learning and experimenting purposes.
I have got source dataframe with columns: ['question', 'type', 'value', 'type_code'].
With this I can add a new column called "percentage":

def compute_percentage(df):
    # Compute percentage of value with question group
    perc = (df["value"] / df["value"].sum()) * 100
    df["percentage"] = perc
    return df

source = (
    source
    .groupby("question", group_keys=True)
    .apply(compute_percentage)
    .reset_index(drop=True)
)

And now I have source dataframe with the following columns:
['question', 'type', 'value', 'type_code', 'percentage']

Now I want to add a new column called percentage_end with the following code presented below (but this is not working properly - it gives wrong results):

def compute_percentage_end(df):

    # Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
    perc = df["percentage"]
    df["percentage_end"] = perc.cumsum() - (perc.iloc[-2] + perc.iloc[-1] + perc.iloc[0] / 2)
    return df

source = (
    source
    .groupby("question", group_keys=True)
    .apply(compute_percentage_end)
    .reset_index(drop=True)
)

I do not know why it doesn't work ?

Maybe all of 3 columns (percentage, percentage_end, percentage_start) have to be calculated all together, like in my original code ?

But if it is not a case, after figuring it out (adding percentage_end) I would like to add somehow the last column: percentage_start.

I have bought a book today "Pandas for everyone" that will come next week, so I will be studying it patiently.

**deanhystad** · Jul-13-2023, 05:34 PM

You should start with 1 question.

import pandas as pd

df = pd.DataFrame(
    [
        {"type": "Strongly disagree", "value": 24},
        {"type": "Disagree", "value": 294},
        {"type": "Neither agree nor disagree", "value": 594},
        {"type": "Agree", "value": 1927},
        {"type": "Strongly agree", "value": 376},
    ]
)

df["type_code"] = df.type.map(
    {
        "Strongly disagree": -2,
        "Disagree": -1,
        "Neither agree nor disagree": 0,
        "Agree": 1,
        "Strongly agree": 2,
    }
)

df = df.set_index("type_code").sort_index()

perc = (df["value"] / df["value"].sum()) * 100
df["percentage"] = perc

perc = df["percentage"]
df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)

perc = df["percentage"]
df["percentage_start"] = df["percentage_end"] - perc

df = df.reset_index(drop=True)

print(df)

Output:                         type  value  percentage  percentage_end  percentage_start
0           Strongly disagree     24    0.746501      -18.382582        -19.129082
1                    Disagree    294    9.144635       -9.237947        -18.382582
2  Neither agree nor disagree    594   18.475894        9.237947         -9.237947
3                       Agree   1927   59.937792       69.175739          9.237947
4              Strongly agree    376   11.695179       80.870918         69.175739

Grouping and using the apply function on groups is fairly advanced pandas. Save that for later.

Andrzej_Andrzej · Jul-13-2023, 07:13 PM

When I have used my "source" dataframe with your code, after this part of code, I get an error:

perc = (df["value"] / df["value"].sum()) * 100
df["percentage"] = perc
 
perc = df["percentage"]
df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)

Error:
ValueError: cannot reindex on an axis with duplicate labels

**deanhystad** · (This post was last modified: Jul-13-2023, 08:43 PM by deanhystad.)

compute_percentages() only works when processing 1 question at a time. If you want to experiment and see how compute_percentages() works outside of the apply(), you need to reduce the data to 1 question. You cannot do something like compute_percentages() for your entire DataFrame. Grouping is required if the dataframe has more than 1 question.

Andrzej_Andrzej · Jul-13-2023, 08:19 PM

(Jul-13-2023, 07:44 PM)deanhystad Wrote: you need to reduce the data to 1 question.

I can see it now. Thank you. By the way, how can I insert a picture here ? If I click on image icon it says: "Enter the image URL:".
What kind of URL should I provide over there ?

**deanhystad** · (This post was last modified: Jul-13-2023, 08:44 PM by deanhystad.)

All the forum help you will ever need.

https://python-forum.io/misc.php?action=help

In general, posting images is frowned upon unless there is no other way to present the information.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	two functions working in a strange way	zapad	2	143	Yesterday, 01:35 PM Last Post: zapad
	Excel isnt working properly after python function is started	IchNar	3	183	Yesterday, 10:27 AM Last Post: lillydalson
	negative memory usage	akbarza	1	183	Apr-27-2024, 08:43 AM Last Post: Gribouillis
	Python trivial endgame engine is not working as expected	max22	0	547	Feb-24-2024, 04:41 PM Last Post: max22
	File Handling not working properly	TheLummen	8	790	Feb-17-2024, 07:47 PM Last Post: TheLummen
	Spyder console zoom in not working?	Dionysis	2	479	Feb-06-2024, 03:31 PM Last Post: paul18fr
	Gmpy2 Newbie Working on Precision	charlesrkiss	5	588	Jan-23-2024, 04:23 PM Last Post: charlesrkiss
	SendKeys not working	SanjayGMusafir	4	503	Jan-16-2024, 12:07 PM Last Post: EdwardMatthew
	Text conversion to lowercase is not working	ineuw	3	501	Jan-16-2024, 02:42 AM Last Post: ineuw
	working directory if using windows path-variable	chitarup	2	755	Nov-28-2023, 11:36 PM Last Post: chitarup

Negative indexing/selecting working and not working

User Panel Messages

Announcements