Feb-04-2022, 11:10 PM
I have to create several summary tables and include 2 below for simplicity. I'm trying to think of a way to minimize the code...rather than type out 10 of these blocks. Is there a straightforward way to iterate by the groupby variables? The first run uses col1/col2, the second run uses col3/col4, and on and on.
I'm somewhat new to Python, so appreciate any advice!
I'm somewhat new to Python, so appreciate any advice!
NEED1= HAVE.groupBy('col1',"col2")\ .agg(F.sum('col5').alias('col5'), \ F.sum('col6').alias('col6'), \ F.sum('col7').alias('col7'), \ F.sum('col8').alias('col8')) \ .sort('col1','col2') NEED2= HAVE.groupBy('col3',"col4")\ .agg(F.sum('col5').alias('col5'), \ F.sum('col6').alias('col6'), \ F.sum('col7').alias('col7'), \ F.sum('col8').alias('col8')) \ .sort('col3','col4') NEED1.show() NEED2.show()