Python Forum
Pandas: summing columns conditional on the column labels
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas: summing columns conditional on the column labels
#1
Hello everyone. I am using an input and output dataset to calculate the summation of columns conditional on the name of columns. For example, the data in the dataframe looks as below.

Country | IndustryCode | US1 | US2 | US3 | Canada1 | Canada2 | Canada3 | China1 | China2 | China3 | ...
US | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
US | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
US | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
... | ... | .. | .. | .. | .. | .. | .. | .. | .. | .. | ...

What I want to do is to add an additional column that will sum all columns that has China or Canada as a column label for each industry. In other words, if I add correctly, I should have the below dataframe:

Country | IndustryCode | US1 | US2 | US3 | Canada1 | Canada2 | Canada3 | China1 | China2 | China3 | ... | Csum1 | Csum2 | Csum3
US | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
US | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
US | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
... | ... | .. | .. | .. | .. | .. | .. | .. | .. | .. | ... | 6 | 5 | 4

To do this, I thought that I should first create the additional columns in the dataset for each industry using for loop.
Then, I would add another loop in this look to check if each column's label contains the China or Canada.
for x in rance(1,4):
    locals()[df["Csum"+str(x)]] = 0
    for y in [US Canada China]:
        #I was not sure how I can approach this part...
Could anyone give me an advice as to how I can approach this issue? Thank you in advance for your help!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Pandas DataFrame and unmatched column sritsv19 0 431 Jul-07-2020, 12:52 PM
Last Post: sritsv19
  Rename labels of a bar chart Matplotlib smalatray 1 329 Jul-01-2020, 01:48 AM
Last Post: hussainmujtaba
  Assigning Column nunique values to another DataFrame column Pythonito 0 268 Jun-25-2020, 05:04 PM
Last Post: Pythonito
  Difference of two columns in Pandas dataframe zinho 2 661 Jun-17-2020, 03:36 PM
Last Post: zinho
  Python pandas merge with or conditional Lafayette 0 417 May-07-2020, 07:34 PM
Last Post: Lafayette
  Pandas - Dynamic column aggregation based on another column theroadbacktonature 0 345 Apr-17-2020, 04:54 PM
Last Post: theroadbacktonature
  Add column to CSV using Pandas nsadams87xx 2 516 Apr-15-2020, 08:41 PM
Last Post: snippsat
  add formatted column to pandas data frame alkaline3 0 344 Mar-22-2020, 06:44 PM
Last Post: alkaline3
  DataFrame: To print a column value which is not null out of 5 columns mani 2 438 Mar-18-2020, 06:07 AM
Last Post: mani
Question Dividing a single column of dataframe into multiple columns based on char length darpInd 2 434 Mar-14-2020, 09:19 AM
Last Post: scidam

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020