Python Forum

Pages: 1 2

Provide a python function for the below case and add a column
1) Sum parent weight based on Level_3 and distribute the values based on values in child_weight column for all the rows in Level_3
2) If there is no value in child_weight for a particular group in level_3 then re-distribute across all rows in level_2 within the same group
3) Values in Total column is the expected value after running the function.
Sample DF:
Security CompRating Level_3 Level_2 parent_weight child_weight Total
A A1 Banking Financial 4.486 3.522 4.419
B BBB3 Banking Financial 0.445 0.559 0.701
C A1 Financial Services Financial 0.821 0.994 1.274
D A3 Financial Services Financial 2.744 3.445 4.415
E A2 Financial Services Financial 1.919
F A1 Insurance Financial 0.393

There is a problem with your data, not all lines have the same number of items.

Lines C and D have 8 items, but you only have 7 column names, so where will you put item 8? E has 6 items and F has 5 items. Bit of a problem.

You should make sure your data is correct before you start. I changed the data a bit.

This will give you a database, then perhaps you can explain what it is that you want to do more clearly?

import pandas as pd

# this will cause problems because you only have 7 column names
# but C and D have 8 items E has 6 items and F has 5 items
datastring = """A A1 Banking Financial 4.486 3.522 4.419
B BBB3 Banking Financial 0.445 0.559 0.701
C A1 Financial Services Financial 0.821 0.994 1.274
D A3 Financial Services Financial 2.744 3.445 4.415
E A2 Financial Services Financial 1.919
F A1 Insurance Financial 0.393"""

# revised data
datastring = '''A A1 Banking Financial 4.486 3.522 4.419
B BBB3 Banking Financial 0.445 0.559 0.701
C A1 Financial Services 0.821 0.994 1.274
D A3 Financial Services 2.744 3.445 4.415
E A2 Financial Services 1.919 5.791 2.348
F A1 Insurance Financial 0.393 0.998 3.726'''

cols = ['Security', 'CompRating', 'Level_3', 'Level_2', 'parent_weight', 'child_weight', 'Total'] # 7 column names

lists = [s.split() for s in datastring.split('\n')]
df = pd.DataFrame(lists, columns=cols)

You could make an empty dataframe and then read the lists in 1 by 1, but you still need another column name!

Apologies if it was not clear earlier.

Security CompRating Level_3 Level_2 parent_weight child_weight
A A1 Banking Financial 4.486 3.522
B BBB3 Banking Financial 0.445 0.559
C A1 Financial_Services Financial 0.821 0.994
D A3 Financial_Services Financial 2.744 3.445
E A2 Financial_Services Financial 1.919 NULL
F A1 Insurance Financial 0.393 NULL

More clarification for the above case:
Formula Example :
For Security A: (3.522 * (4.486 + 0.445)) / (3.522+0.559) since they all belong to same level_3
same formula will be applicable to all rows with Financial_Services as well
But for Insurance..0.393 will be divided to all rows which has same Level_2 as security F. The top four rows will get additional value from Insurance

That's better, but I am still not clear on what you want to do exactly.

Quote:For Security A: (3.522 * (4.486 + 0.445)) / (3.522+0.559)
For Security A: (child_weight_row0 * (parent_weight_row0 + parent_weight_row1)) / (child_weight_row0 + child_weight_row1)

How does this formula repeat for other rows? I have no knowledge of the Financial Industry or what terminology they use.

As I see it, rows B and F don't need to be in the dataframe, as we just use the constants like 0.445, 0.559 or 0.393.
But row F has CompRating A1, so I am not sure how that fits together!! Is that a mistake?

Perhaps you could clarify?

import pandas as pd

cols = ['Security', 'CompRating', 'Level_3', 'Level_2', 'parent_weight', 'child_weight']

# revised data
datastring = """A A1 Banking Financial 4.486 3.522
B BBB3 Banking Financial 0.445 0.559
C A1 Financial_Services Financial 0.821 0.994
D A3 Financial_Services Financial 2.744 3.445
E A2 Financial_Services Financial 1.919 NULL
F A1 Insurance Financial 0.393 NULL"""

lists = [s.split() for s in datastring.split('\n')]
df = pd.DataFrame(lists, columns=cols)
"""
# This is the dataframe you get
df
                  
  Security CompRating             Level_3    Level_2 parent_weight child_weight
0        A         A1             Banking  Financial         4.486        3.522
1        B       BBB3             Banking  Financial         0.445        0.559
2        C         A1  Financial_Services  Financial         0.821        0.994
3        D         A3  Financial_Services  Financial         2.744        3.445
4        E         A2  Financial_Services  Financial         1.919         NULL
5        F         A1           Insurance  Financial         0.393         NULL
"""

# As an example, get all the rows which have CompRating = A1
for index, row in df.iterrows():
    if df['CompRating'][index] == 'A1':
        for col in df.columns:
            print(row[col])

At the bottom is a way of getting values and rows you want, but I am really not sure what you want to calculate, or where you want the result put!

The comp rating column can be ignored.
I need to add another column to the above dataframe by doing the below calcs
First Step: Sum parent_weight for each group in Level_3 and then apply them to each row which has same level_3 on a pro-rata basis using below formula
=(child_weight row0 * SUM(parent_weight_Level_3))/SUM(child_weight_Level_3)
Second Step: Since insurance in Level_3 group does not have any value in child_weight, parent_weight value gets divided among all rows which has same Level_2.
In the above example..0.393 gets divided among top four rows
=(parent_weight row6 * child_weight row0)/sum(child_weight_Level_2)
Third Step:
Add values from step 1 and step 2

At the end sum of parent weight across all rows should equal to sum of additional column created. Let me know if further clarification is required

Attached file has step 1 and step 2 shown as example. Total column is the additional column that needs to be added to the dataframe after running the function.

Quote:Second Step: Since insurance in Level_3 group does not have any value in child_weight,
parent_weight value gets divided among all rows which has same Level_2.

All 5 rows have Financial in Level_2, how is the above supposed to work?
Did you really check the data?

import pandas as pd

cols = ['Security', 'CompRating', 'Level_3', 'Level_2', 'parent_weight', 'child_weight']

# revised data
datastring = """A A1 Banking Financial 4.486 3.522
B BBB3 Banking Financial 0.445 0.559
C A1 Financial_Services Financial 0.821 0.994
D A3 Financial_Services Financial 2.744 3.445
E A2 Financial_Services Financial 1.919 0.456
F A1 Insurance Financial 0.393 NULL"""

lists = [s.split() for s in datastring.split('\n')]
df = pd.DataFrame(lists, columns=cols)
# set the number of columns to show 
pd.set_option('display.max_columns', 10)
# add a column with a default value of 0 or set all values to 0 if df['Totals'] exists
df['Totals'] = 0
# the numbers are strings, make them floats
df['parent_weight'] = pd.to_numeric(df.parent_weight, errors='coerce')
df['child_weight'] = pd.to_numeric(df.child_weight, errors='coerce')

# calculate totals and write to Totals for Banking and Financial_Services
for index in df.index:
    # step 1.1 make sums
    if df['Level_3'][index] == 'Banking':
        sumparent = df.loc[df['Level_3'] == 'Banking', 'parent_weight'].sum()
        sumchild = df.loc[df['Level_3'] == 'Banking', 'child_weight'].sum()
        total = df['Totals'][index] + (df['child_weight'][index] * sumparent/sumchild)
        df.at[index, 'Totals'] = total
    elif df['Level_3'][index] == 'Financial_Services':
        sumparent = df.loc[df['Level_3'] == 'Financial_Services', 'parent_weight'].sum()
        sumchild = df.loc[df['Level_3'] == 'Financial_Services', 'child_weight'].sum()
        total = df['Totals'][index] + (df['child_weight'][index] * sumparent/sumchild)
        df.at[index, 'Totals'] = total

Output:  Security CompRating             Level_3    Level_2  parent_weight  \
0        A         A1             Banking  Financial          4.486   
1        B       BBB3             Banking  Financial          0.445   
2        C         A1  Financial_Services  Financial          0.821   
3        D         A3  Financial_Services  Financial          2.744   
4        E         A2  Financial_Services  Financial          1.919   
5        F         A1           Insurance  Financial          0.393   

   child_weight    Totals  
0         3.522  4.255570  
1         0.559  0.675430  
2         0.994  1.113605  
3         3.445  3.859526  
4         0.456  0.510869  
5           NaN  0.000000

Sorry, but I don't understand what you want to do with the row Insurance, please explain.

For the insurance, please distribute values row which has similar Level_2 and has a child weight.

The Excel you posted was what I needed!

myApp() below will get what you want, but it is not flexible, because, for Banking , the first step uses SUM($D$2:$D$3)/SUM($E$2:$E$3) 2 times. What if, in a real table you have many rows with Banking?

If Level_3 = Financial_Services we have: SUM($D$4:$D$6)/SUM($E$4:$E$6), 3 times.

Its hard to see the repetitive nature in a large table. Do you have a bigger Excel?

As it stands, this is only useful for these data. But then, I am no expert with Pandas!

I think you could shorten this by using .group_by and you don't really need columns Step1 and Step2, that is just for testing.

import pandas as pd

def myApp():
    cols = ['Security', 'CompRating', 'Level_3', 'Level_2', 'parent_weight', 'child_weight']

    # revised data
    datastring = """A A1 Banking Financial 4.486 3.522
    B BBB3 Banking Financial 0.445 0.559
    C A1 Financial_Services Financial 0.821 0.994
    D A3 Financial_Services Financial 2.744 3.445
    E A2 Financial_Services Financial 1.919 NULL
    F A1 Insurance Financial 0.393 NULL"""

    lists = [s.split() for s in datastring.split('\n')]
    df = pd.DataFrame(lists, columns=cols)

    # set the number of columns to show 
    pd.set_option('display.max_columns', 10)
    # add a column with a default value of 0 or set all values to 0 if df['Totals'] exists
    df['Step1'] = 0
    df['Step2'] = 0
    df['Totals'] = 0

    # the numbers are strings, make them floats
    df['parent_weight'] = pd.to_numeric(df.parent_weight, errors='coerce')
    df['child_weight'] = pd.to_numeric(df.child_weight, errors='coerce')

    for index in df.index:
        print('Index is:', index, 'Level_3 is', df['Level_3'][index])
        # step 1 make sums for Banking
        if df['Level_3'][index] == 'Banking' and pd.notna(df['child_weight'][index]):
            sumparent = df['parent_weight'][0:2].sum()
            sumchild =  df['child_weight'][0:2].sum()
            step1 = df['child_weight'][index] * (sumparent/sumchild)
            print(f'Level_3 is Banking, Step1 is {step1}')
            df.at[index, 'Step1'] = step1       
        elif df['Level_3'][index] == 'Financial_Services' and pd.notna(df['child_weight'][index]):
            sumparent = df['parent_weight'][2:5].sum()
            sumchild =  df['child_weight'][2:5].sum()
            step1 = df['child_weight'][index] * (sumparent/sumchild)
            print(f'Level_3 is Financial_Services, Step1 is {step1}')
            df.at[index, 'Step1'] = step1
            # step 2 case Insurance
        elif df['Level_3'][index] == 'Insurance':
            sumchild = df['child_weight'].sum()
            for jindex in df.index:
                if pd.notna(df['child_weight'][jindex]):
                    step2 = df['child_weight'][jindex] * (df['parent_weight'][5]/sumchild)
                    print(f'Level_3 is Insurance, Step2 is {step2}')
                    df.at[jindex, 'Step2'] = step2
                    
    for index in df.index:
        total = df['Step1'][index] + df['Step2'][index]
        df.at[index, 'Totals'] = total

    print(f'Sum of parent_weight = {df["parent_weight"].sum()}')
    print(f'Sum of Totals = {df["Totals"].sum()}')

myapp() gives this df:

Output:  Security CompRating             Level_3    Level_2  parent_weight  \
0        A         A1             Banking  Financial          4.486   
1        B       BBB3             Banking  Financial          0.445   
2        C         A1  Financial_Services  Financial          0.821   
3        D         A3  Financial_Services  Financial          2.744   
4        E         A2  Financial_Services  Financial          1.919   
5        F         A1           Insurance  Financial          0.393   

   child_weight     Step1     Step2    Totals  
0         3.522  4.255570  0.162458  4.418029  
1         0.559  0.675430  0.025785  0.701215  
2         0.994  1.228001  0.045850  1.273851  
3         3.445  4.255999  0.158907  4.414906  
4           NaN       NaN  0.000000       NaN  
5           NaN  0.000000       NaN       NaN

Pages: 1 2

11drk9

Pedroski55

11drk9

11drk9

Pedroski55

11drk9

11drk9

Pedroski55

11drk9

Pedroski55