Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Weight Distribution
#1
Provide a python function for the below case and add a column
1) Sum parent weight based on Level_3 and distribute the values based on values in child_weight column for all the rows in Level_3
2) If there is no value in child_weight for a particular group in level_3 then re-distribute across all rows in level_2 within the same group
3) Values in Total column is the expected value after running the function.
Sample DF:
Security CompRating Level_3 Level_2 parent_weight child_weight Total
A A1 Banking Financial 4.486 3.522 4.419
B BBB3 Banking Financial 0.445 0.559 0.701
C A1 Financial Services Financial 0.821 0.994 1.274
D A3 Financial Services Financial 2.744 3.445 4.415
E A2 Financial Services Financial 1.919
F A1 Insurance Financial 0.393
Reply
#2
There is a problem with your data, not all lines have the same number of items.

Lines C and D have 8 items, but you only have 7 column names, so where will you put item 8? E has 6 items and F has 5 items. Bit of a problem.

You should make sure your data is correct before you start. I changed the data a bit.

This will give you a database, then perhaps you can explain what it is that you want to do more clearly?

import pandas as pd

# this will cause problems because you only have 7 column names
# but C and D have 8 items E has 6 items and F has 5 items
datastring = """A A1 Banking Financial 4.486 3.522 4.419
B BBB3 Banking Financial 0.445 0.559 0.701
C A1 Financial Services Financial 0.821 0.994 1.274
D A3 Financial Services Financial 2.744 3.445 4.415
E A2 Financial Services Financial 1.919
F A1 Insurance Financial 0.393"""

# revised data
datastring = '''A A1 Banking Financial 4.486 3.522 4.419
B BBB3 Banking Financial 0.445 0.559 0.701
C A1 Financial Services 0.821 0.994 1.274
D A3 Financial Services 2.744 3.445 4.415
E A2 Financial Services 1.919 5.791 2.348
F A1 Insurance Financial 0.393 0.998 3.726'''

cols = ['Security', 'CompRating', 'Level_3', 'Level_2', 'parent_weight', 'child_weight', 'Total'] # 7 column names

lists = [s.split() for s in datastring.split('\n')]
df = pd.DataFrame(lists, columns=cols)
You could make an empty dataframe and then read the lists in 1 by 1, but you still need another column name!
Reply
#3
Apologies if it was not clear earlier.

Security CompRating Level_3 Level_2 parent_weight child_weight
A A1 Banking Financial 4.486 3.522
B BBB3 Banking Financial 0.445 0.559
C A1 Financial_Services Financial 0.821 0.994
D A3 Financial_Services Financial 2.744 3.445
E A2 Financial_Services Financial 1.919 NULL
F A1 Insurance Financial 0.393 NULL
Reply
#4
More clarification for the above case:
Formula Example :
For Security A: (3.522 * (4.486 + 0.445)) / (3.522+0.559) since they all belong to same level_3
same formula will be applicable to all rows with Financial_Services as well
But for Insurance..0.393 will be divided to all rows which has same Level_2 as security F. The top four rows will get additional value from Insurance
Reply
#5
That's better, but I am still not clear on what you want to do exactly.

Quote:For Security A: (3.522 * (4.486 + 0.445)) / (3.522+0.559)
For Security A: (child_weight_row0 * (parent_weight_row0 + parent_weight_row1)) / (child_weight_row0 + child_weight_row1)

How does this formula repeat for other rows? I have no knowledge of the Financial Industry or what terminology they use.

As I see it, rows B and F don't need to be in the dataframe, as we just use the constants like 0.445, 0.559 or 0.393.
But row F has CompRating A1, so I am not sure how that fits together!! Is that a mistake?

Perhaps you could clarify?

import pandas as pd

cols = ['Security', 'CompRating', 'Level_3', 'Level_2', 'parent_weight', 'child_weight']

# revised data
datastring = """A A1 Banking Financial 4.486 3.522
B BBB3 Banking Financial 0.445 0.559
C A1 Financial_Services Financial 0.821 0.994
D A3 Financial_Services Financial 2.744 3.445
E A2 Financial_Services Financial 1.919 NULL
F A1 Insurance Financial 0.393 NULL"""

lists = [s.split() for s in datastring.split('\n')]
df = pd.DataFrame(lists, columns=cols)
"""
# This is the dataframe you get
df
                  
  Security CompRating             Level_3    Level_2 parent_weight child_weight
0        A         A1             Banking  Financial         4.486        3.522
1        B       BBB3             Banking  Financial         0.445        0.559
2        C         A1  Financial_Services  Financial         0.821        0.994
3        D         A3  Financial_Services  Financial         2.744        3.445
4        E         A2  Financial_Services  Financial         1.919         NULL
5        F         A1           Insurance  Financial         0.393         NULL
"""

# As an example, get all the rows which have CompRating = A1
for index, row in df.iterrows():
    if df['CompRating'][index] == 'A1':
        for col in df.columns:
            print(row[col])
At the bottom is a way of getting values and rows you want, but I am really not sure what you want to calculate, or where you want the result put!
Reply
#6
The comp rating column can be ignored.
I need to add another column to the above dataframe by doing the below calcs
First Step: Sum parent_weight for each group in Level_3 and then apply them to each row which has same level_3 on a pro-rata basis using below formula
=(child_weight row0 * SUM(parent_weight_Level_3))/SUM(child_weight_Level_3)
Second Step: Since insurance in Level_3 group does not have any value in child_weight, parent_weight value gets divided among all rows which has same Level_2.
In the above example..0.393 gets divided among top four rows
=(parent_weight row6 * child_weight row0)/sum(child_weight_Level_2)
Third Step:
Add values from step 1 and step 2

At the end sum of parent weight across all rows should equal to sum of additional column created. Let me know if further clarification is required
Reply
#7
Attached file has step 1 and step 2 shown as example. Total column is the additional column that needs to be added to the dataframe after running the function.

Attached Files

.xlsx   Agg_Example.xlsx (Size: 9.77 KB / Downloads: 27)
Reply
#8
Quote:Second Step: Since insurance in Level_3 group does not have any value in child_weight,
parent_weight value gets divided among all rows which has same Level_2.

All 5 rows have Financial in Level_2, how is the above supposed to work?
Did you really check the data?

import pandas as pd

cols = ['Security', 'CompRating', 'Level_3', 'Level_2', 'parent_weight', 'child_weight']

# revised data
datastring = """A A1 Banking Financial 4.486 3.522
B BBB3 Banking Financial 0.445 0.559
C A1 Financial_Services Financial 0.821 0.994
D A3 Financial_Services Financial 2.744 3.445
E A2 Financial_Services Financial 1.919 0.456
F A1 Insurance Financial 0.393 NULL"""

lists = [s.split() for s in datastring.split('\n')]
df = pd.DataFrame(lists, columns=cols)
# set the number of columns to show 
pd.set_option('display.max_columns', 10)
# add a column with a default value of 0 or set all values to 0 if df['Totals'] exists
df['Totals'] = 0
# the numbers are strings, make them floats
df['parent_weight'] = pd.to_numeric(df.parent_weight, errors='coerce')
df['child_weight'] = pd.to_numeric(df.child_weight, errors='coerce')

# calculate totals and write to Totals for Banking and Financial_Services
for index in df.index:
    # step 1.1 make sums
    if df['Level_3'][index] == 'Banking':
        sumparent = df.loc[df['Level_3'] == 'Banking', 'parent_weight'].sum()
        sumchild = df.loc[df['Level_3'] == 'Banking', 'child_weight'].sum()
        total = df['Totals'][index] + (df['child_weight'][index] * sumparent/sumchild)
        df.at[index, 'Totals'] = total
    elif df['Level_3'][index] == 'Financial_Services':
        sumparent = df.loc[df['Level_3'] == 'Financial_Services', 'parent_weight'].sum()
        sumchild = df.loc[df['Level_3'] == 'Financial_Services', 'child_weight'].sum()
        total = df['Totals'][index] + (df['child_weight'][index] * sumparent/sumchild)
        df.at[index, 'Totals'] = total
Output:
Security CompRating Level_3 Level_2 parent_weight \ 0 A A1 Banking Financial 4.486 1 B BBB3 Banking Financial 0.445 2 C A1 Financial_Services Financial 0.821 3 D A3 Financial_Services Financial 2.744 4 E A2 Financial_Services Financial 1.919 5 F A1 Insurance Financial 0.393 child_weight Totals 0 3.522 4.255570 1 0.559 0.675430 2 0.994 1.113605 3 3.445 3.859526 4 0.456 0.510869 5 NaN 0.000000
Sorry, but I don't understand what you want to do with the row Insurance, please explain.
Reply
#9
For the insurance, please distribute values row which has similar Level_2 and has a child weight.
Reply
#10
The Excel you posted was what I needed!

myApp() below will get what you want, but it is not flexible, because, for Banking , the first step uses SUM($D$2:$D$3)/SUM($E$2:$E$3) 2 times. What if, in a real table you have many rows with Banking?

If Level_3 = Financial_Services we have: SUM($D$4:$D$6)/SUM($E$4:$E$6), 3 times.

Its hard to see the repetitive nature in a large table. Do you have a bigger Excel?

As it stands, this is only useful for these data. But then, I am no expert with Pandas!

I think you could shorten this by using .group_by and you don't really need columns Step1 and Step2, that is just for testing.

import pandas as pd

def myApp():
    cols = ['Security', 'CompRating', 'Level_3', 'Level_2', 'parent_weight', 'child_weight']

    # revised data
    datastring = """A A1 Banking Financial 4.486 3.522
    B BBB3 Banking Financial 0.445 0.559
    C A1 Financial_Services Financial 0.821 0.994
    D A3 Financial_Services Financial 2.744 3.445
    E A2 Financial_Services Financial 1.919 NULL
    F A1 Insurance Financial 0.393 NULL"""

    lists = [s.split() for s in datastring.split('\n')]
    df = pd.DataFrame(lists, columns=cols)

    # set the number of columns to show 
    pd.set_option('display.max_columns', 10)
    # add a column with a default value of 0 or set all values to 0 if df['Totals'] exists
    df['Step1'] = 0
    df['Step2'] = 0
    df['Totals'] = 0

    # the numbers are strings, make them floats
    df['parent_weight'] = pd.to_numeric(df.parent_weight, errors='coerce')
    df['child_weight'] = pd.to_numeric(df.child_weight, errors='coerce')

    for index in df.index:
        print('Index is:', index, 'Level_3 is', df['Level_3'][index])
        # step 1 make sums for Banking
        if df['Level_3'][index] == 'Banking' and pd.notna(df['child_weight'][index]):
            sumparent = df['parent_weight'][0:2].sum()
            sumchild =  df['child_weight'][0:2].sum()
            step1 = df['child_weight'][index] * (sumparent/sumchild)
            print(f'Level_3 is Banking, Step1 is {step1}')
            df.at[index, 'Step1'] = step1       
        elif df['Level_3'][index] == 'Financial_Services' and pd.notna(df['child_weight'][index]):
            sumparent = df['parent_weight'][2:5].sum()
            sumchild =  df['child_weight'][2:5].sum()
            step1 = df['child_weight'][index] * (sumparent/sumchild)
            print(f'Level_3 is Financial_Services, Step1 is {step1}')
            df.at[index, 'Step1'] = step1
            # step 2 case Insurance
        elif df['Level_3'][index] == 'Insurance':
            sumchild = df['child_weight'].sum()
            for jindex in df.index:
                if pd.notna(df['child_weight'][jindex]):
                    step2 = df['child_weight'][jindex] * (df['parent_weight'][5]/sumchild)
                    print(f'Level_3 is Insurance, Step2 is {step2}')
                    df.at[jindex, 'Step2'] = step2
                    
    for index in df.index:
        total = df['Step1'][index] + df['Step2'][index]
        df.at[index, 'Totals'] = total

    print(f'Sum of parent_weight = {df["parent_weight"].sum()}')
    print(f'Sum of Totals = {df["Totals"].sum()}')
myapp() gives this df:

Output:
Security CompRating Level_3 Level_2 parent_weight \ 0 A A1 Banking Financial 4.486 1 B BBB3 Banking Financial 0.445 2 C A1 Financial_Services Financial 0.821 3 D A3 Financial_Services Financial 2.744 4 E A2 Financial_Services Financial 1.919 5 F A1 Insurance Financial 0.393 child_weight Step1 Step2 Totals 0 3.522 4.255570 0.162458 4.418029 1 0.559 0.675430 0.025785 0.701215 2 0.994 1.228001 0.045850 1.273851 3 3.445 4.255999 0.158907 4.414906 4 NaN NaN 0.000000 NaN 5 NaN 0.000000 NaN NaN
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  distribution fit Glaucio 1 239 Apr-07-2024, 12:30 AM
Last Post: Larz60+
Information Best distribution method inovermyhead100 0 568 Jul-19-2023, 07:39 AM
Last Post: inovermyhead100
  How do I use a whl puython distribution? barryjo 6 1,800 Aug-15-2022, 03:00 AM
Last Post: barryjo
  Help with basic weight converter PythonSquizzel 3 1,423 Jun-29-2022, 02:42 PM
Last Post: deanhystad
  Python Networkx: Visualize an edge weight with a bubble/circle uvw 0 2,011 Sep-01-2021, 06:26 AM
Last Post: uvw
  Coin Toss - Distribution lasek723 6 3,124 Oct-04-2020, 01:36 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020