Sep-10-2020, 05:58 PM
Hello everyone. I am using an input and output dataset to calculate the summation of columns conditional on the name of columns. For example, the data in the dataframe looks as below.
Country | IndustryCode | US1 | US2 | US3 | Canada1 | Canada2 | Canada3 | China1 | China2 | China3 | ...
US | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
US | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
US | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
... | ... | .. | .. | .. | .. | .. | .. | .. | .. | .. | ...
What I want to do is to add an additional column that will sum all columns that has China or Canada as a column label for each industry. In other words, if I add correctly, I should have the below dataframe:
Country | IndustryCode | US1 | US2 | US3 | Canada1 | Canada2 | Canada3 | China1 | China2 | China3 | ... | Csum1 | Csum2 | Csum3
US | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
US | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
US | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
... | ... | .. | .. | .. | .. | .. | .. | .. | .. | .. | ... | 6 | 5 | 4
To do this, I thought that I should first create the additional columns in the dataset for each industry using for loop.
Then, I would add another loop in this look to check if each column's label contains the China or Canada.
Country | IndustryCode | US1 | US2 | US3 | Canada1 | Canada2 | Canada3 | China1 | China2 | China3 | ...
US | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
US | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
US | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
... | ... | .. | .. | .. | .. | .. | .. | .. | .. | .. | ...
What I want to do is to add an additional column that will sum all columns that has China or Canada as a column label for each industry. In other words, if I add correctly, I should have the below dataframe:
Country | IndustryCode | US1 | US2 | US3 | Canada1 | Canada2 | Canada3 | China1 | China2 | China3 | ... | Csum1 | Csum2 | Csum3
US | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
US | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
US | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
... | ... | .. | .. | .. | .. | .. | .. | .. | .. | .. | ... | 6 | 5 | 4
To do this, I thought that I should first create the additional columns in the dataset for each industry using for loop.
Then, I would add another loop in this look to check if each column's label contains the China or Canada.
for x in rance(1,4): locals()[df["Csum"+str(x)]] = 0 for y in [US Canada China]: #I was not sure how I can approach this part...Could anyone give me an advice as to how I can approach this issue? Thank you in advance for your help!