Apr-17-2020, 04:54 PM
An algorithm runs daily and generates a file. The file can have dynamic columns in each run.
First run : country,date,exchange_rate,sale_amt,profit_amt,1st_purch,2nd_purch
Second run: country,date,exchange_rate,sale_amt,profit_amt,1st_purch,2nd_purch,3rd_purch
Third run : country,date,exchange_rate,sale_amt,profit_amt,1st_purch,2nd_purch,3rd_purch,margin_amt
Only the 'amt' columns should be divided by exchange_rate and agregated (sum). Remaining columns can be aggregated (sum) as it is.
Below is input (not output):
I wrote below code snippet, but unable to generate the correct expression to include division by exchange_rate for currency columns.
First run : country,date,exchange_rate,sale_amt,profit_amt,1st_purch,2nd_purch
Second run: country,date,exchange_rate,sale_amt,profit_amt,1st_purch,2nd_purch,3rd_purch
Third run : country,date,exchange_rate,sale_amt,profit_amt,1st_purch,2nd_purch,3rd_purch,margin_amt
Only the 'amt' columns should be divided by exchange_rate and agregated (sum). Remaining columns can be aggregated (sum) as it is.
Below is input (not output):
Output:country date sale_amt profit_amt 1st_purch 2nd_purch exchange_rate
US 2019-12-01 100 10 11 21 2
US 2019-12-02 200 20 12 22 2
US 2019-12-03 300 30 13 23 2
US 2019-12-04 400 40 14 34 2
US 2019-12-05 500 50 15 25 2
Desired Output:Output:country sale_amt profit_amt 1st_purch 2nd_purch
US 750 75 65 125
Quote:Logic:
sale_amt=sum(sale_amt/exchange_rate)
profit_amt=sum(profit_amt/exchange_rate)
1st_purch=sum(1st_purch)
2nd_purch=sum(2nd_purch)
I wrote below code snippet, but unable to generate the correct expression to include division by exchange_rate for currency columns.
columns = df.columns groupbyColumns = ["country"] columnNotRequiredAgg=["date","exchange_rate,"] # aggCols = list(set(columns) - set(columnNotRequiredAgg)) expr = {x:'sum' if 'amt' in x else 'sum' for x in aggCols } <<-- how to write correct logic df.groupby(groupbyColumns).agg(expr)