Python Forum
Create new column in dataframe - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Create new column in dataframe (/thread-42308.html)



Create new column in dataframe - Scott - Jun-14-2024

Hi everyone,

I have a dataframe I am trying to manipulate to create two new columns and drop 1 based on the sum function the does a conditional group by and I cannot find any example online. Below is the dataframe I have:

Date id class amount
1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1

I'm looking to create a new column, L_class that sums the amount grouped by date and ID when class = 'L' and then do the same thing for H_class when class = 'H' and then drop the class field. the data frame should look like this.

Date id L_class H_class
1/12/2002 1 12 6
1/12/2002 2 3
1/01/2003 1 2 8
1/01/2003 2 10 7

I've tried googling how to do it and I have a function that does the calculation but not one that can create a new variable based on the condition. Does anyone have any ideas or know of an example online?

Thanks for any help


RE: Create new column in dataframe - Larz60+ - Jun-14-2024

start here:
create new columns derived from existing columns


RE: Create new column in dataframe - Scott - Jun-16-2024

Thank you @Larz60+

I know how to create a new column its more trying to do everything at once in a clean manner. I've decided to break it down into steps and transpose the columns out

I had to pivot the column out and then left join the pivots back onto the dataframe then do a multi variable groupby sum with transform to create a new variable. I then had to drop all the columns I did not need and do a drop duplicates. It was not pretty but it worked.


RE: Create new column in dataframe - Larz60+ - Jun-16-2024

post your code (using bbcode tags) so we can help.


RE: Create new column in dataframe - Scott - Jun-21-2024

Apologies for the delay, it is on my work computer and they won't let me send python files from work to home computer so I had to retype it.

I have imported a CSV on my local drive but I've included the URL to the dataset so you can download it and just update that piece of code.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#https://opendata-nzta.opendata.arcgis.com/datasets/NZTA::tms-daily-traffic-counts-api/explore?showTable=true
path = r'C:\Users\SGrah\OneDrive\Documents\Python Scripts\Data\Economics data\TMS_Telemetry_Sites_4135033762041494366.csv' 

df = pd.read_csv(path, header=0, usecols=[0,1,2,3,4,5,6,7,8,9], index_col=[0])
df.head()

df2['Start Date'] =pd.to_datetime(df['Start Date'])

df2['Day'] = df2['Start Date'].dt.day
df2['Month'] = df2['Start Date'].dt.month
df2['Year'] = df2['Start Date'].dt.year

df2['Date'] = pd.to_datetime(df['Year'].astype(str) +
                                        df2['Month'].astype(str), format='%Y%m')

df2.drop(['Day','Month','Year'], axis=1, inplace=True)
df2['Date'] = pd.to_datetime(df2['Date'], format='%Y-%m-%d')
df.head()

trans = df2.pivot(columns='Class Weight', values='Traffic Count')
print(trans)

df3 = df2.merge(trans, how='left', on='OBJECTID')

df3['Heavy_sum'] = df3.groupby(['Site Reference', 'Date'])['Heavy'].transform('sum')
df3['Light_sum'] = df3.groupby(['Site Reference', 'Date'])['Light'].transform('sum')
df3.head()
Also is there a way to paste tables from excel neatly into this forum (like with the code above)? If you look at my first post in this thread the tables are hard to read.

Thanks heaps


RE: Create new column in dataframe - Scott - Jun-21-2024

post error


RE: Create new column in dataframe - Larz60+ - Jun-21-2024

Scott Wrote:Also is there a way to paste tables from excel neatly into this forum (like with the code above)? If you look at my first post in this thread the tables are hard to read.
You can use the output tags, (export excel data as csv and post that).


RE: Create new column in dataframe - deanhystad - Jun-21-2024

I cannot turn this:
Output:
Date id class amount 1/12/2002 1 L 5 1/12/2002 1 H 6 1/12/2002 1 L 7 1/12/2002 2 L 3 1/01/2003 1 L 2 1/01/2003 1 H 8 1/01/2003 2 H 7 1/01/2003 2 L 9 1/01/2003 2 L 1
Into this:
Output:
Date id L_class H_class 1/12/2002 1 12 6 1/12/2002 2 3 1/01/2003 1 2 8 1/01/2003 2 10 7
But I can get close:
import pandas as pd
from io import StringIO


data = """Date id class amount
1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1
"""
df = pd.read_csv(StringIO(data), sep=" ")
print("It is easy to go from this", df, sep="\n")

df2 = df.groupby(["Date", "id", "class"]).agg({"amount": "sum"})
print("\nTo this", df2, sep="\n")

print("\nor this.", df2.reset_index(), sep="\n")

df3 = pd.pivot_table(df, index=["Date", "id"], columns=["class"], values=["amount"])
print("\nBut neither of those have H and L as columns.  Maybe a pivot table is a better idea.", df3, sep="\n")

df4 = df3.reset_index()
print(
    "\nNow we have H and L columns, but both the rows and columns are multi-index.  Flatten the row index.",
    df4,
    sep="\n",
)

df4.columns = ["Date", "id", "H", "L"]
print("\nAnd flatten the column index.", df4, sep="\n")
Output:
It is easy to go from this Date id class amount 0 1/12/2002 1 L 5 1 1/12/2002 1 H 6 2 1/12/2002 1 L 7 3 1/12/2002 2 L 3 4 1/01/2003 1 L 2 5 1/01/2003 1 H 8 6 1/01/2003 2 H 7 7 1/01/2003 2 L 9 8 1/01/2003 2 L 1 To this amount Date id class 1/01/2003 1 H 8 L 2 2 H 7 L 10 1/12/2002 1 H 6 L 12 2 L 3 or this. Date id class amount 0 1/01/2003 1 H 8 1 1/01/2003 1 L 2 2 1/01/2003 2 H 7 3 1/01/2003 2 L 10 4 1/12/2002 1 H 6 5 1/12/2002 1 L 12 6 1/12/2002 2 L 3 But neither of those have H and L as columns. Maybe a pivot table is a better idea. amount class H L Date id 1/01/2003 1 8.0 2.0 2 7.0 5.0 1/12/2002 1 6.0 6.0 2 NaN 3.0 Now we have H and L columns, but both the rows and columns are multi-index. Flatten the row index. Date id amount class H L 0 1/01/2003 1 8.0 2.0 1 1/01/2003 2 7.0 5.0 2 1/12/2002 1 6.0 6.0 3 1/12/2002 2 NaN 3.0 And flatten the column index. Date id H L 0 1/01/2003 1 8.0 2.0 1 1/01/2003 2 7.0 5.0 2 1/12/2002 1 6.0 6.0 3 1/12/2002 2 NaN 3.0



RE: Create new column in dataframe - Pedroski55 - Jun-23-2024

More fun than Suduko on a Sunday morning!

Is this what you want?

I hope the experts here can make this shorter!

import pandas as pd

cols = ['Date', 'id', 'class', 'amount']
string = '''1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1'''

# given the above data, make a df
a = string.split('\n')
b = [c.split() for c in a]
df = pd.DataFrame(b, columns=cols)
# amount is a string, change to numeric
df["amount"] = pd.to_numeric(df["amount"])
# get all the Ls
dfL = df.loc[df['class'] == 'L'] .copy()
# rename amount to L_class since that is the desired output format
dfL = dfL.rename(columns={'amount': 'L_class'})
# get all Hs
dfH = df.loc[df['class'] == 'H'].copy()
# rename amount to H_class since that is the desired output format
dfH = dfH.rename(columns={'amount': 'H_class'})
# merge dfL and dfH
new = pd.merge(dfL, dfH, how='outer', left_on=['Date', 'id', "class"], right_on = ['Date', 'id', "class"]) # looks good
# do the groupby and add the numbers in L_class and H_class
HL = new.groupby(["Date", "id", "class"]).agg({"L_class": "sum", "H_class": "sum"}).copy()
Gives this:

Output:
HL L_class H_class Date id class 1/01/2003 1 H 0.0 8.0 L 2.0 0.0 2 H 0.0 7.0 L 10.0 0.0 1/12/2002 1 H 0.0 6.0 L 12.0 0.0 2 L 3.0 0.0



RE: Create new column in dataframe - snippsat - Jun-23-2024

import pandas as pd

data = {
    'Date': ['1/12/2002', '1/12/2002', '1/12/2002', '1/12/2002', '1/01/2003', '1/01/2003', '1/01/2003', '1/01/2003', '1/01/2003'],
    'id': [1, 1, 1, 2, 1, 1, 2, 2, 2],
    'class': ['L', 'H', 'L', 'L', 'L', 'H', 'H', 'L', 'L'],
    'amount': [5, 6, 7, 3, 2, 8, 7, 9, 1]
}

df = pd.DataFrame(data)
# Pivot the DataFrame
pivot_df = df.pivot_table(index=['Date', 'id'], columns='class', values='amount', aggfunc='sum', fill_value=0)
pivot_df.columns = [f'{col}_class' for col in pivot_df.columns]
# Reset index to get Date and id back as columns
pivot_df = pivot_df.reset_index()
print(pivot_df)
Output:
Date id H_class L_class 0 1/01/2003 1 8 2 1 1/01/2003 2 7 10 2 1/12/2002 1 6 12 3 1/12/2002 2 0 3