Python Forum
Pandas: summing columns conditional on the column labels
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas: summing columns conditional on the column labels
#1
Hello everyone. I am using an input and output dataset to calculate the summation of columns conditional on the name of columns. For example, the data in the dataframe looks as below.

Country | IndustryCode | US1 | US2 | US3 | Canada1 | Canada2 | Canada3 | China1 | China2 | China3 | ...
US | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
US | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
US | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
Canada | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
China | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ...
... | ... | .. | .. | .. | .. | .. | .. | .. | .. | .. | ...

What I want to do is to add an additional column that will sum all columns that has China or Canada as a column label for each industry. In other words, if I add correctly, I should have the below dataframe:

Country | IndustryCode | US1 | US2 | US3 | Canada1 | Canada2 | Canada3 | China1 | China2 | China3 | ... | Csum1 | Csum2 | Csum3
US | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
US | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
US | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
Canada | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 1 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 2 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
China | 3 | 1 | 4 | 10 | 3 | 4 | 1 | 3 | 1 | 3 | ... | 6 | 5 | 4
... | ... | .. | .. | .. | .. | .. | .. | .. | .. | .. | ... | 6 | 5 | 4

To do this, I thought that I should first create the additional columns in the dataset for each industry using for loop.
Then, I would add another loop in this look to check if each column's label contains the China or Canada.
for x in rance(1,4):
    locals()[df["Csum"+str(x)]] = 0
    for y in [US Canada China]:
        #I was not sure how I can approach this part...
Could anyone give me an advice as to how I can approach this issue? Thank you in advance for your help!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Parsing and summing time deltas (duration) onto bar + pie charts using pandas - - DRY Drone4four 2 473 Feb-10-2024, 06:04 PM
Last Post: Drone4four
  Assigning conditional values in Pandas Scott 3 700 Dec-19-2023, 03:10 AM
Last Post: Larz60+
  concat 3 columns of dataframe to one column flash77 2 753 Oct-03-2023, 09:29 PM
Last Post: flash77
  HTML Decoder pandas dataframe column mbrown009 3 932 Sep-29-2023, 05:56 PM
Last Post: deanhystad
  xlswriter(pandas) and conditional format paul18fr 1 1,130 Aug-28-2022, 07:56 AM
Last Post: paul18fr
  pandas column percentile nuncio 7 2,357 Aug-10-2022, 04:41 AM
Last Post: nuncio
  pandas: Compute the % of the unique values in a column JaneTan 1 1,747 Oct-25-2021, 07:55 PM
Last Post: jefsummers
  Pandas Data frame column condition check based on length of the value aditi06 1 2,637 Jul-28-2021, 11:08 AM
Last Post: jefsummers
  How to move each team row to a new column. Pandas vladiwnl 0 1,688 Jun-13-2021, 08:10 AM
Last Post: vladiwnl
  iretate over columns in df and calculate euclidean distance with one column in pandas Pit292 0 3,250 May-09-2021, 06:46 PM
Last Post: Pit292

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020