Jun-21-2024, 10:36 PM
I cannot turn this:
Output:Date id class amount
1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1
Into this:Output:Date id L_class H_class
1/12/2002 1 12 6
1/12/2002 2 3
1/01/2003 1 2 8
1/01/2003 2 10 7
But I can get close:import pandas as pd from io import StringIO data = """Date id class amount 1/12/2002 1 L 5 1/12/2002 1 H 6 1/12/2002 1 L 7 1/12/2002 2 L 3 1/01/2003 1 L 2 1/01/2003 1 H 8 1/01/2003 2 H 7 1/01/2003 2 L 9 1/01/2003 2 L 1 """ df = pd.read_csv(StringIO(data), sep=" ") print("It is easy to go from this", df, sep="\n") df2 = df.groupby(["Date", "id", "class"]).agg({"amount": "sum"}) print("\nTo this", df2, sep="\n") print("\nor this.", df2.reset_index(), sep="\n") df3 = pd.pivot_table(df, index=["Date", "id"], columns=["class"], values=["amount"]) print("\nBut neither of those have H and L as columns. Maybe a pivot table is a better idea.", df3, sep="\n") df4 = df3.reset_index() print( "\nNow we have H and L columns, but both the rows and columns are multi-index. Flatten the row index.", df4, sep="\n", ) df4.columns = ["Date", "id", "H", "L"] print("\nAnd flatten the column index.", df4, sep="\n")
Output:It is easy to go from this
Date id class amount
0 1/12/2002 1 L 5
1 1/12/2002 1 H 6
2 1/12/2002 1 L 7
3 1/12/2002 2 L 3
4 1/01/2003 1 L 2
5 1/01/2003 1 H 8
6 1/01/2003 2 H 7
7 1/01/2003 2 L 9
8 1/01/2003 2 L 1
To this
amount
Date id class
1/01/2003 1 H 8
L 2
2 H 7
L 10
1/12/2002 1 H 6
L 12
2 L 3
or this.
Date id class amount
0 1/01/2003 1 H 8
1 1/01/2003 1 L 2
2 1/01/2003 2 H 7
3 1/01/2003 2 L 10
4 1/12/2002 1 H 6
5 1/12/2002 1 L 12
6 1/12/2002 2 L 3
But neither of those have H and L as columns. Maybe a pivot table is a better idea.
amount
class H L
Date id
1/01/2003 1 8.0 2.0
2 7.0 5.0
1/12/2002 1 6.0 6.0
2 NaN 3.0
Now we have H and L columns, but both the rows and columns are multi-index. Flatten the row index.
Date id amount
class H L
0 1/01/2003 1 8.0 2.0
1 1/01/2003 2 7.0 5.0
2 1/12/2002 1 6.0 6.0
3 1/12/2002 2 NaN 3.0
And flatten the column index.
Date id H L
0 1/01/2003 1 8.0 2.0
1 1/01/2003 2 7.0 5.0
2 1/12/2002 1 6.0 6.0
3 1/12/2002 2 NaN 3.0