Posts: 101
Threads: 48
Joined: May 2018
Hi everyone,
I have a dataframe I am trying to manipulate to create two new columns and drop 1 based on the sum function the does a conditional group by and I cannot find any example online. Below is the dataframe I have:
Date id class amount
1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1
I'm looking to create a new column, L_class that sums the amount grouped by date and ID when class = 'L' and then do the same thing for H_class when class = 'H' and then drop the class field. the data frame should look like this.
Date id L_class H_class
1/12/2002 1 12 6
1/12/2002 2 3
1/01/2003 1 2 8
1/01/2003 2 10 7
I've tried googling how to do it and I have a function that does the calculation but not one that can create a new variable based on the condition. Does anyone have any ideas or know of an example online?
Thanks for any help
Posts: 12,054
Threads: 488
Joined: Sep 2016
Posts: 101
Threads: 48
Joined: May 2018
Thank you @ Larz60+
I know how to create a new column its more trying to do everything at once in a clean manner. I've decided to break it down into steps and transpose the columns out
I had to pivot the column out and then left join the pivots back onto the dataframe then do a multi variable groupby sum with transform to create a new variable. I then had to drop all the columns I did not need and do a drop duplicates. It was not pretty but it worked.
Posts: 12,054
Threads: 488
Joined: Sep 2016
post your code (using bbcode tags) so we can help.
Posts: 101
Threads: 48
Joined: May 2018
Jun-21-2024, 05:09 AM
(This post was last modified: Jun-21-2024, 05:21 AM by Scott.)
Apologies for the delay, it is on my work computer and they won't let me send python files from work to home computer so I had to retype it.
I have imported a CSV on my local drive but I've included the URL to the dataset so you can download it and just update that piece of code.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#https://opendata-nzta.opendata.arcgis.com/datasets/NZTA::tms-daily-traffic-counts-api/explore?showTable=true
path = r'C:\Users\SGrah\OneDrive\Documents\Python Scripts\Data\Economics data\TMS_Telemetry_Sites_4135033762041494366.csv'
df = pd.read_csv(path, header=0, usecols=[0,1,2,3,4,5,6,7,8,9], index_col=[0])
df.head()
df2['Start Date'] =pd.to_datetime(df['Start Date'])
df2['Day'] = df2['Start Date'].dt.day
df2['Month'] = df2['Start Date'].dt.month
df2['Year'] = df2['Start Date'].dt.year
df2['Date'] = pd.to_datetime(df['Year'].astype(str) +
df2['Month'].astype(str), format='%Y%m')
df2.drop(['Day','Month','Year'], axis=1, inplace=True)
df2['Date'] = pd.to_datetime(df2['Date'], format='%Y-%m-%d')
df.head()
trans = df2.pivot(columns='Class Weight', values='Traffic Count')
print(trans)
df3 = df2.merge(trans, how='left', on='OBJECTID')
df3['Heavy_sum'] = df3.groupby(['Site Reference', 'Date'])['Heavy'].transform('sum')
df3['Light_sum'] = df3.groupby(['Site Reference', 'Date'])['Light'].transform('sum')
df3.head() Also is there a way to paste tables from excel neatly into this forum (like with the code above)? If you look at my first post in this thread the tables are hard to read.
Thanks heaps
Posts: 101
Threads: 48
Joined: May 2018
Jun-21-2024, 05:20 AM
(This post was last modified: Jun-21-2024, 05:22 AM by Scott.)
Posts: 12,054
Threads: 488
Joined: Sep 2016
Scott Wrote:Also is there a way to paste tables from excel neatly into this forum (like with the code above)? If you look at my first post in this thread the tables are hard to read. You can use the output tags, (export excel data as csv and post that).
Posts: 6,831
Threads: 20
Joined: Feb 2020
I cannot turn this:
Output: Date id class amount
1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1
Into this:
Output: Date id L_class H_class
1/12/2002 1 12 6
1/12/2002 2 3
1/01/2003 1 2 8
1/01/2003 2 10 7
But I can get close:
import pandas as pd
from io import StringIO
data = """Date id class amount
1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1
"""
df = pd.read_csv(StringIO(data), sep=" ")
print("It is easy to go from this", df, sep="\n")
df2 = df.groupby(["Date", "id", "class"]).agg({"amount": "sum"})
print("\nTo this", df2, sep="\n")
print("\nor this.", df2.reset_index(), sep="\n")
df3 = pd.pivot_table(df, index=["Date", "id"], columns=["class"], values=["amount"])
print("\nBut neither of those have H and L as columns. Maybe a pivot table is a better idea.", df3, sep="\n")
df4 = df3.reset_index()
print(
"\nNow we have H and L columns, but both the rows and columns are multi-index. Flatten the row index.",
df4,
sep="\n",
)
df4.columns = ["Date", "id", "H", "L"]
print("\nAnd flatten the column index.", df4, sep="\n") Output: It is easy to go from this
Date id class amount
0 1/12/2002 1 L 5
1 1/12/2002 1 H 6
2 1/12/2002 1 L 7
3 1/12/2002 2 L 3
4 1/01/2003 1 L 2
5 1/01/2003 1 H 8
6 1/01/2003 2 H 7
7 1/01/2003 2 L 9
8 1/01/2003 2 L 1
To this
amount
Date id class
1/01/2003 1 H 8
L 2
2 H 7
L 10
1/12/2002 1 H 6
L 12
2 L 3
or this.
Date id class amount
0 1/01/2003 1 H 8
1 1/01/2003 1 L 2
2 1/01/2003 2 H 7
3 1/01/2003 2 L 10
4 1/12/2002 1 H 6
5 1/12/2002 1 L 12
6 1/12/2002 2 L 3
But neither of those have H and L as columns. Maybe a pivot table is a better idea.
amount
class H L
Date id
1/01/2003 1 8.0 2.0
2 7.0 5.0
1/12/2002 1 6.0 6.0
2 NaN 3.0
Now we have H and L columns, but both the rows and columns are multi-index. Flatten the row index.
Date id amount
class H L
0 1/01/2003 1 8.0 2.0
1 1/01/2003 2 7.0 5.0
2 1/12/2002 1 6.0 6.0
3 1/12/2002 2 NaN 3.0
And flatten the column index.
Date id H L
0 1/01/2003 1 8.0 2.0
1 1/01/2003 2 7.0 5.0
2 1/12/2002 1 6.0 6.0
3 1/12/2002 2 NaN 3.0
Pedroski55 likes this post
Posts: 1,096
Threads: 143
Joined: Jul 2017
More fun than Suduko on a Sunday morning!
Is this what you want?
I hope the experts here can make this shorter!
import pandas as pd
cols = ['Date', 'id', 'class', 'amount']
string = '''1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1'''
# given the above data, make a df
a = string.split('\n')
b = [c.split() for c in a]
df = pd.DataFrame(b, columns=cols)
# amount is a string, change to numeric
df["amount"] = pd.to_numeric(df["amount"])
# get all the Ls
dfL = df.loc[df['class'] == 'L'] .copy()
# rename amount to L_class since that is the desired output format
dfL = dfL.rename(columns={'amount': 'L_class'})
# get all Hs
dfH = df.loc[df['class'] == 'H'].copy()
# rename amount to H_class since that is the desired output format
dfH = dfH.rename(columns={'amount': 'H_class'})
# merge dfL and dfH
new = pd.merge(dfL, dfH, how='outer', left_on=['Date', 'id', "class"], right_on = ['Date', 'id', "class"]) # looks good
# do the groupby and add the numbers in L_class and H_class
HL = new.groupby(["Date", "id", "class"]).agg({"L_class": "sum", "H_class": "sum"}).copy() Gives this:
Output: HL
L_class H_class
Date id class
1/01/2003 1 H 0.0 8.0
L 2.0 0.0
2 H 0.0 7.0
L 10.0 0.0
1/12/2002 1 H 0.0 6.0
L 12.0 0.0
2 L 3.0 0.0
Posts: 7,329
Threads: 123
Joined: Sep 2016
Jun-23-2024, 08:52 PM
(This post was last modified: Jun-23-2024, 08:52 PM by snippsat.)
import pandas as pd
data = {
'Date': ['1/12/2002', '1/12/2002', '1/12/2002', '1/12/2002', '1/01/2003', '1/01/2003', '1/01/2003', '1/01/2003', '1/01/2003'],
'id': [1, 1, 1, 2, 1, 1, 2, 2, 2],
'class': ['L', 'H', 'L', 'L', 'L', 'H', 'H', 'L', 'L'],
'amount': [5, 6, 7, 3, 2, 8, 7, 9, 1]
}
df = pd.DataFrame(data)
# Pivot the DataFrame
pivot_df = df.pivot_table(index=['Date', 'id'], columns='class', values='amount', aggfunc='sum', fill_value=0)
pivot_df.columns = [f'{col}_class' for col in pivot_df.columns]
# Reset index to get Date and id back as columns
pivot_df = pivot_df.reset_index()
print(pivot_df) Output: Date id H_class L_class
0 1/01/2003 1 8 2
1 1/01/2003 2 7 10
2 1/12/2002 1 6 12
3 1/12/2002 2 0 3
Pedroski55 likes this post
|