Python Forum
Pandas: summing columns conditional on the column labels - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Pandas: summing columns conditional on the column labels (/thread-29574.html)



Pandas: summing columns conditional on the column labels - ddd2332 - Sep-10-2020

Hello everyone. I am using an input and output dataset to calculate the summation of columns conditional on the name of columns. For example, the data in the dataframe looks as below.

Country | IndustryCode | US1 | US2 | US3 | Canada1 | Canada2 | Canada3 | China1 | China2 | China3 | ...
US | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
US | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
US | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
... | ... | .. | .. | .. | .. | .. | .. | .. | .. | .. | ...

What I want to do is to add an additional column that will sum all columns that has China or Canada as a column label for each industry. In other words, if I add correctly, I should have the below dataframe:

Country | IndustryCode | US1 | US2 | US3 | Canada1 | Canada2 | Canada3 | China1 | China2 | China3 | ... | Csum1 | Csum2 | Csum3
US | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
US | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
US | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
... | ... | .. | .. | .. | .. | .. | .. | .. | .. | .. | ... | 6 | 5 | 4

To do this, I thought that I should first create the additional columns in the dataset for each industry using for loop.
Then, I would add another loop in this look to check if each column's label contains the China or Canada.
for x in rance(1,4):
    locals()[df["Csum"+str(x)]] = 0
    for y in [US Canada China]:
        #I was not sure how I can approach this part...
Could anyone give me an advice as to how I can approach this issue? Thank you in advance for your help!