Python Forum
Pandas - Dynamic column aggregation based on another column - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Pandas - Dynamic column aggregation based on another column (/thread-25979.html)



Pandas - Dynamic column aggregation based on another column - theroadbacktonature - Apr-17-2020

An algorithm runs daily and generates a file. The file can have dynamic columns in each run.

First run : country,date,exchange_rate,sale_amt,profit_amt,1st_purch,2nd_purch
Second run: country,date,exchange_rate,sale_amt,profit_amt,1st_purch,2nd_purch,3rd_purch
Third run : country,date,exchange_rate,sale_amt,profit_amt,1st_purch,2nd_purch,3rd_purch,margin_amt

Only the 'amt' columns should be divided by exchange_rate and agregated (sum). Remaining columns can be aggregated (sum) as it is.

Below is input (not output):

Output:
country date sale_amt profit_amt 1st_purch 2nd_purch exchange_rate US 2019-12-01 100 10 11 21 2 US 2019-12-02 200 20 12 22 2 US 2019-12-03 300 30 13 23 2 US 2019-12-04 400 40 14 34 2 US 2019-12-05 500 50 15 25 2
Desired Output:

Output:
country sale_amt profit_amt 1st_purch 2nd_purch US 750 75 65 125
Quote:Logic:
sale_amt=sum(sale_amt/exchange_rate)
profit_amt=sum(profit_amt/exchange_rate)
1st_purch=sum(1st_purch)
2nd_purch=sum(2nd_purch)

I wrote below code snippet, but unable to generate the correct expression to include division by exchange_rate for currency columns.
columns = df.columns
groupbyColumns = ["country"]
columnNotRequiredAgg=["date","exchange_rate,"] #
aggCols = list(set(columns) - set(columnNotRequiredAgg))

expr = {x:'sum' if 'amt' in x else 'sum' for x in aggCols } <<-- how to write correct logic
df.groupby(groupbyColumns).agg(expr)