Python Forum

Full Version: Pandas: summing columns conditional on the column labels
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello everyone. I am using an input and output dataset to calculate the summation of columns conditional on the name of columns. For example, the data in the dataframe looks as below.

Country | IndustryCode | US1 | US2 | US3 | Canada1 | Canada2 | Canada3 | China1 | China2 | China3 | ...
US | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
US | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
US | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
... | ... | .. | .. | .. | .. | .. | .. | .. | .. | .. | ...

What I want to do is to add an additional column that will sum all columns that has China or Canada as a column label for each industry. In other words, if I add correctly, I should have the below dataframe:

Country | IndustryCode | US1 | US2 | US3 | Canada1 | Canada2 | Canada3 | China1 | China2 | China3 | ... | Csum1 | Csum2 | Csum3
US | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
US | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
US | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
... | ... | .. | .. | .. | .. | .. | .. | .. | .. | .. | ... | 6 | 5 | 4

To do this, I thought that I should first create the additional columns in the dataset for each industry using for loop.
Then, I would add another loop in this look to check if each column's label contains the China or Canada.
for x in rance(1,4):
    locals()[df["Csum"+str(x)]] = 0
    for y in [US Canada China]:
        #I was not sure how I can approach this part...
Could anyone give me an advice as to how I can approach this issue? Thank you in advance for your help!