Python Forum
Negative indexing/selecting working and not working
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Negative indexing/selecting working and not working
#11
Star 
Thank you again for your patience and clarification.
I will thoroughly study what you have written, because this is a lot to digest for me.
I am very grateful for your replies.
best regards,
Andrzej
Reply
#12
Main thing to remember is that indexing for arrays and lists treat the index like an ordinal number. list[0] is the first element. array[5] is the 6th element. list[-2] is the second to the last element, but will raise an index error if list[] only has 1 element.

indexing in pandas is like indexing in dictionaries. The index in series[5] treats 5 as a key, not an ordinal number. series[5] could be the first item in the series or the last item in the series. There is no way to know because the "5" is not the "position" of the row, it is the "name" of the row, and these row names may have nothing in common with the position of the row. When you ask for series[5], pandas searches "series" for a row index that matches "5" and returns the associated value.
Reply
#13
Two more questions if I may, please:
1. How to make local (because inside a function) object perc to became a global and not disappear after the function compute_percentages() finishes its work ?
I have done it previously by:
perc = (source["value"] / source["value"].sum()) * 100
2. If I would like to separate that function into three independent but consecutive functions, in order to add following columns in turn: percentage, percentage_end, and percentage_start, meaning doing it step-by-step, and updating "source" dataframe at each and every step, how do I do it please ?

I have done it for percentage column like this:
def compute_percentage(df):
    # Compute percentage of value with question group
    perc = (df["value"] / df["value"].sum()) * 100
    df["percentage"] = perc
    return df

source = (
    source
    .groupby("question", group_keys=True)
    .apply(compute_percentage)
    .reset_index(drop=True)
)
but I can't create remaining two, meaning percentage end and percentage start. Is it possible or all 3 of them must be calculated together like in my original code from my first post ?
Reply
#14
1. You don't want to do this. compute_percentages() is called multiple times, once for each question group. The perc inside the function changes each time the function is called. If you could see perc outside the function you would only see the perc for the last question.

2. I do not understand your question. compute_percentages() already calculates percentage, percentage_start, and percentage_end. Why are you writing these again?

You should describe what you are trying to achieve, not how you plan to achieve it. As you said, you are a beginner, and maybe your approach to the problem is completely wrong. My early attempts using pandas resulted in bloated, messy code that were replaced with a few simple commands once I had a clue about how pandas works.
Reply
#15
(Jul-13-2023, 02:48 PM)deanhystad Wrote: 2. I do not understand your question. compute_percentages() already calculates percentage, percentage_start, and percentage_end. Why are you writing these again?

You should describe what you are trying to achieve, not how you plan to achieve it. As you said, you are a beginner, and maybe your approach to the problem is completely wrong. My early attempts using pandas resulted in bloated, messy code that were replaced with a few simple commands once I had a clue about how pandas works.

This is what I would like to achieve:
About 2. This code from my first post works perfectly and all your explanations allow me to start to understand it. I would like to ask if it would be possible to divide that function called calculate_percentages() into 3 separate functions, just for the sake of learning and experimenting purposes.
I have got source dataframe with columns: ['question', 'type', 'value', 'type_code'].
With this I can add a new column called "percentage":
def compute_percentage(df):
    # Compute percentage of value with question group
    perc = (df["value"] / df["value"].sum()) * 100
    df["percentage"] = perc
    return df

source = (
    source
    .groupby("question", group_keys=True)
    .apply(compute_percentage)
    .reset_index(drop=True)
)
And now I have source dataframe with the following columns:
['question', 'type', 'value', 'type_code', 'percentage']

Now I want to add a new column called percentage_end with the following code presented below (but this is not working properly - it gives wrong results):
def compute_percentage_end(df):

    # Compute percentage end, centered on "Neither agree nor disagree" (type_code 0)
    perc = df["percentage"]
    df["percentage_end"] = perc.cumsum() - (perc.iloc[-2] + perc.iloc[-1] + perc.iloc[0] / 2)
    return df

source = (
    source
    .groupby("question", group_keys=True)
    .apply(compute_percentage_end)
    .reset_index(drop=True)
)
I do not know why it doesn't work ?

Maybe all of 3 columns (percentage, percentage_end, percentage_start) have to be calculated all together, like in my original code ?

But if it is not a case, after figuring it out (adding percentage_end) I would like to add somehow the last column: percentage_start.

I have bought a book today "Pandas for everyone" that will come next week, so I will be studying it patiently.
Reply
#16
You should start with 1 question.
import pandas as pd

df = pd.DataFrame(
    [
        {"type": "Strongly disagree", "value": 24},
        {"type": "Disagree", "value": 294},
        {"type": "Neither agree nor disagree", "value": 594},
        {"type": "Agree", "value": 1927},
        {"type": "Strongly agree", "value": 376},
    ]
)

df["type_code"] = df.type.map(
    {
        "Strongly disagree": -2,
        "Disagree": -1,
        "Neither agree nor disagree": 0,
        "Agree": 1,
        "Strongly agree": 2,
    }
)

df = df.set_index("type_code").sort_index()

perc = (df["value"] / df["value"].sum()) * 100
df["percentage"] = perc

perc = df["percentage"]
df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)

perc = df["percentage"]
df["percentage_start"] = df["percentage_end"] - perc

df = df.reset_index(drop=True)

print(df)
Output:
type value percentage percentage_end percentage_start 0 Strongly disagree 24 0.746501 -18.382582 -19.129082 1 Disagree 294 9.144635 -9.237947 -18.382582 2 Neither agree nor disagree 594 18.475894 9.237947 -9.237947 3 Agree 1927 59.937792 69.175739 9.237947 4 Strongly agree 376 11.695179 80.870918 69.175739
Grouping and using the apply function on groups is fairly advanced pandas. Save that for later.
Reply
#17
When I have used my "source" dataframe with your code, after this part of code, I get an error:
perc = (df["value"] / df["value"].sum()) * 100
df["percentage"] = perc
 
perc = df["percentage"]
df["percentage_end"] = perc.cumsum() - (perc[-2] + perc[-1] + perc[0] / 2)
Error:
ValueError: cannot reindex on an axis with duplicate labels
Reply
#18
compute_percentages() only works when processing 1 question at a time. If you want to experiment and see how compute_percentages() works outside of the apply(), you need to reduce the data to 1 question. You cannot do something like compute_percentages() for your entire DataFrame. Grouping is required if the dataframe has more than 1 question.
Reply
#19
(Jul-13-2023, 07:44 PM)deanhystad Wrote: you need to reduce the data to 1 question.

I can see it now. Thank you. By the way, how can I insert a picture here ? If I click on image icon it says: "Enter the image URL:".
What kind of URL should I provide over there ?
Reply
#20
All the forum help you will ever need.

https://python-forum.io/misc.php?action=help

In general, posting images is frowned upon unless there is no other way to present the information.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  two functions working in a strange way zapad 2 143 Yesterday, 01:35 PM
Last Post: zapad
  Excel isnt working properly after python function is started IchNar 3 183 Yesterday, 10:27 AM
Last Post: lillydalson
  negative memory usage akbarza 1 183 Apr-27-2024, 08:43 AM
Last Post: Gribouillis
  Python trivial endgame engine is not working as expected max22 0 547 Feb-24-2024, 04:41 PM
Last Post: max22
  File Handling not working properly TheLummen 8 790 Feb-17-2024, 07:47 PM
Last Post: TheLummen
  Spyder console zoom in not working? Dionysis 2 479 Feb-06-2024, 03:31 PM
Last Post: paul18fr
  Gmpy2 Newbie Working on Precision charlesrkiss 5 588 Jan-23-2024, 04:23 PM
Last Post: charlesrkiss
  SendKeys not working SanjayGMusafir 4 503 Jan-16-2024, 12:07 PM
Last Post: EdwardMatthew
  Text conversion to lowercase is not working ineuw 3 501 Jan-16-2024, 02:42 AM
Last Post: ineuw
  working directory if using windows path-variable chitarup 2 755 Nov-28-2023, 11:36 PM
Last Post: chitarup

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020