Python Forum
Matrix Search and Calculation Pandas - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Matrix Search and Calculation Pandas (/thread-23866.html)



Matrix Search and Calculation Pandas - luxlambo227 - Jan-21-2020

Dear all,

I have problems with a matrix search and calculation. This is my problem.
I have a df1 with different year values for each indicator and country.
I have a df2 with combinations of countries and yearly values.
The desired output is df3 where the product of the indicators fo the countires per year is created and calcualted.

These are the Dataframe:

df1 =pd.DataFrame({'Country':['Armenia','Azerbaidjan','Belarus','Armenia','Azerbaidjan','Belarus'],\
             'Indictaor':['G','G','G','H', 'H', 'H'],'2005':[3,4,5,6,7,4],'2006':[6,3,1,3,5,6]})
 	

[python]
df2 = pd.DataFrame({'Year':[2005,2006,2005,2006],
                    'Country1':['Armenia','Armenia','Azerbaidjan','Azerbaidjan'],
                    'Country2': ['Belarus','Belarus','Belarus','Belarus']})
df3 = pd.DataFrame({'Year':[2005,2006,2005,2006],                   
                    'Country2': ['Belarus','Belarus','Belarus','Belarus'],
                    'Country1':['Armenia','Armenia','Azerbaidjan','Azerbaidjan'],
                     'IndictaorGProduct':[15,6,35,5],
                      'IndictaorHProduct':[24,18,28,30]})
I am extremly thankful if someone could help with this problem as this is really confusing.

Thank you.

Kidn regards


RE: Matrix Search and Calculation Pandas - jefsummers - Jan-21-2020

Got it started, the challenge left to you is getting the cell values for the respective countries, G or H, and year and multiplying putting the entries into the last columns in df4, the table that eventually should look like df3.
import pandas as pd
df1 =pd.DataFrame({'Country':['Armenia','Azerbaidjan','Belarus','Armenia','Azerbaidjan','Belarus'],\
             'Indictaor':['G','G','G','H', 'H', 'H'],'2005':[3,4,5,6,7,4],'2006':[6,3,1,3,5,6]})
print(df1)
df2 = pd.DataFrame({'Year':[2005,2006,2005,2006],
                    'Country1':['Armenia','Armenia','Azerbaidjan','Azerbaidjan'],
                    'Country2': ['Belarus','Belarus','Belarus','Belarus']})
print(df2)
df3 = pd.DataFrame({'Year':[2005,2006,2005,2006],                   
                    'Country2': ['Belarus','Belarus','Belarus','Belarus'],
                    'Country1':['Armenia','Armenia','Azerbaidjan','Azerbaidjan'],
                     'IndictaorGProduct':[15,6,35,5],
                      'IndictaorHProduct':[24,18,28,30]})
print(df3)
cols = ['Year','Country2','Country1','GProduct','HProduct']
df4 = pd.DataFrame(columns=cols)
df4['Year'] = df2['Year']
df4['Country1'] = df2['Country1']
df4['Country2'] = df2['Country2']
for row in df4 :
    #Get the G and H parameters for the countries and years and multiply
    pass

print(df4)
Output:
Country Indictaor 2005 2006 0 Armenia G 3 6 1 Azerbaidjan G 4 3 2 Belarus G 5 1 3 Armenia H 6 3 4 Azerbaidjan H 7 5 5 Belarus H 4 6 Year Country1 Country2 0 2005 Armenia Belarus 1 2006 Armenia Belarus 2 2005 Azerbaidjan Belarus 3 2006 Azerbaidjan Belarus Year Country2 Country1 IndictaorGProduct IndictaorHProduct 0 2005 Belarus Armenia 15 24 1 2006 Belarus Armenia 6 18 2 2005 Belarus Azerbaidjan 35 28 3 2006 Belarus Azerbaidjan 5 30 Year Country2 Country1 GProduct HProduct 0 2005 Belarus Armenia NaN NaN 1 2006 Belarus Armenia NaN NaN 2 2005 Belarus Azerbaidjan NaN NaN 3 2006 Belarus Azerbaidjan NaN NaN



RE: Matrix Search and Calculation Pandas - luxlambo227 - Jan-22-2020

Thank you. Creating the columns is the easy part. I can do that also. The calcualtion is the difficult part.
The main Problem is that tha columns heading can not be identified i get an key error.


RE: Matrix Search and Calculation Pandas - jefsummers - Jan-22-2020

There are probably (definitely) better ways to do this, but this works to do the G product anyway. I leave the H to you - should be able to copy the G code in the for loop to do similar for the H data.
import pandas as pd
df1 =pd.DataFrame({'Country':['Armenia','Azerbaidjan','Belarus','Armenia','Azerbaidjan','Belarus'],\
             'Indictaor':['G','G','G','H', 'H', 'H'],'2005':[3,4,5,6,7,4],'2006':[6,3,1,3,5,6]})
df2 = pd.DataFrame({'Year':[2005,2006,2005,2006],
                    'Country1':['Armenia','Armenia','Azerbaidjan','Azerbaidjan'],
                    'Country2': ['Belarus','Belarus','Belarus','Belarus']})
df3 = pd.DataFrame({'Year':[2005,2006,2005,2006],                   
                    'Country2': ['Belarus','Belarus','Belarus','Belarus'],
                    'Country1':['Armenia','Armenia','Azerbaidjan','Azerbaidjan'],
                     'IndictaorGProduct':[15,6,35,5],
                      'IndictaorHProduct':[24,18,28,30]})
cols = ['Year','Country2','Country1','GProduct','HProduct']
df4 = pd.DataFrame(columns=cols)
df4['Year'] = df2['Year']
df4['Country1'] = df2['Country1']
df4['Country2'] = df2['Country2']
gprod = []
for row in df4.iterrows() :
    c1 = row[1][2]
    c2 = row[1][1]
    yr = str(row[1][0])
    g1 = df1.loc[(df1['Country']==c1)&(df1['Indictaor']=='G')]
    g1val = g1[yr].values[0]
    g2 = df1.loc[(df1['Country']==c2)&(df1['Indictaor']=='G')]
    g2val = g2[yr].values[0]
    print(g1val, g2val, g1val*g2val)
    gprod.append(g1val*g2val)
df4['GProduct'] = gprod
print(df4)
Output:
3 5 15 6 1 6 4 5 20 3 1 3 Year Country2 Country1 GProduct HProduct 0 2005 Belarus Armenia 15 NaN 1 2006 Belarus Armenia 6 NaN 2 2005 Belarus Azerbaidjan 20 NaN 3 2006 Belarus Azerbaidjan 3 NaN



RE: Matrix Search and Calculation Pandas - luxlambo227 - Jan-27-2020

This is excellent stuff. Thank you very much for this help. I had another solution but was still struggling with the date /time finding in the heading. I am no trying that all Indicators (I have many) a new column is added automatically. Any idea on how to make that that dynamic?


RE: Matrix Search and Calculation Pandas - jefsummers - Jan-27-2020

In lines 22 and 24 above the indicator is compared to 'G'. Make that a variable, and you should be able to repeat that process for any indicator.