Python Forum
Create new column in dataframe
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Create new column in dataframe
#1
Hi everyone,

I have a dataframe I am trying to manipulate to create two new columns and drop 1 based on the sum function the does a conditional group by and I cannot find any example online. Below is the dataframe I have:

Date id class amount
1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1

I'm looking to create a new column, L_class that sums the amount grouped by date and ID when class = 'L' and then do the same thing for H_class when class = 'H' and then drop the class field. the data frame should look like this.

Date id L_class H_class
1/12/2002 1 12 6
1/12/2002 2 3
1/01/2003 1 2 8
1/01/2003 2 10 7

I've tried googling how to do it and I have a function that does the calculation but not one that can create a new variable based on the condition. Does anyone have any ideas or know of an example online?

Thanks for any help
Reply
#2
start here:
create new columns derived from existing columns
Reply
#3
Thank you @Larz60+

I know how to create a new column its more trying to do everything at once in a clean manner. I've decided to break it down into steps and transpose the columns out

I had to pivot the column out and then left join the pivots back onto the dataframe then do a multi variable groupby sum with transform to create a new variable. I then had to drop all the columns I did not need and do a drop duplicates. It was not pretty but it worked.
Reply
#4
post your code (using bbcode tags) so we can help.
Reply
#5
Apologies for the delay, it is on my work computer and they won't let me send python files from work to home computer so I had to retype it.

I have imported a CSV on my local drive but I've included the URL to the dataset so you can download it and just update that piece of code.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#https://opendata-nzta.opendata.arcgis.com/datasets/NZTA::tms-daily-traffic-counts-api/explore?showTable=true
path = r'C:\Users\SGrah\OneDrive\Documents\Python Scripts\Data\Economics data\TMS_Telemetry_Sites_4135033762041494366.csv' 

df = pd.read_csv(path, header=0, usecols=[0,1,2,3,4,5,6,7,8,9], index_col=[0])
df.head()

df2['Start Date'] =pd.to_datetime(df['Start Date'])

df2['Day'] = df2['Start Date'].dt.day
df2['Month'] = df2['Start Date'].dt.month
df2['Year'] = df2['Start Date'].dt.year

df2['Date'] = pd.to_datetime(df['Year'].astype(str) +
                                        df2['Month'].astype(str), format='%Y%m')

df2.drop(['Day','Month','Year'], axis=1, inplace=True)
df2['Date'] = pd.to_datetime(df2['Date'], format='%Y-%m-%d')
df.head()

trans = df2.pivot(columns='Class Weight', values='Traffic Count')
print(trans)

df3 = df2.merge(trans, how='left', on='OBJECTID')

df3['Heavy_sum'] = df3.groupby(['Site Reference', 'Date'])['Heavy'].transform('sum')
df3['Light_sum'] = df3.groupby(['Site Reference', 'Date'])['Light'].transform('sum')
df3.head()
Also is there a way to paste tables from excel neatly into this forum (like with the code above)? If you look at my first post in this thread the tables are hard to read.

Thanks heaps
Reply
#6
post error
Reply
#7
Scott Wrote:Also is there a way to paste tables from excel neatly into this forum (like with the code above)? If you look at my first post in this thread the tables are hard to read.
You can use the output tags, (export excel data as csv and post that).
Reply
#8
I cannot turn this:
Output:
Date id class amount 1/12/2002 1 L 5 1/12/2002 1 H 6 1/12/2002 1 L 7 1/12/2002 2 L 3 1/01/2003 1 L 2 1/01/2003 1 H 8 1/01/2003 2 H 7 1/01/2003 2 L 9 1/01/2003 2 L 1
Into this:
Output:
Date id L_class H_class 1/12/2002 1 12 6 1/12/2002 2 3 1/01/2003 1 2 8 1/01/2003 2 10 7
But I can get close:
import pandas as pd
from io import StringIO


data = """Date id class amount
1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1
"""
df = pd.read_csv(StringIO(data), sep=" ")
print("It is easy to go from this", df, sep="\n")

df2 = df.groupby(["Date", "id", "class"]).agg({"amount": "sum"})
print("\nTo this", df2, sep="\n")

print("\nor this.", df2.reset_index(), sep="\n")

df3 = pd.pivot_table(df, index=["Date", "id"], columns=["class"], values=["amount"])
print("\nBut neither of those have H and L as columns.  Maybe a pivot table is a better idea.", df3, sep="\n")

df4 = df3.reset_index()
print(
    "\nNow we have H and L columns, but both the rows and columns are multi-index.  Flatten the row index.",
    df4,
    sep="\n",
)

df4.columns = ["Date", "id", "H", "L"]
print("\nAnd flatten the column index.", df4, sep="\n")
Output:
It is easy to go from this Date id class amount 0 1/12/2002 1 L 5 1 1/12/2002 1 H 6 2 1/12/2002 1 L 7 3 1/12/2002 2 L 3 4 1/01/2003 1 L 2 5 1/01/2003 1 H 8 6 1/01/2003 2 H 7 7 1/01/2003 2 L 9 8 1/01/2003 2 L 1 To this amount Date id class 1/01/2003 1 H 8 L 2 2 H 7 L 10 1/12/2002 1 H 6 L 12 2 L 3 or this. Date id class amount 0 1/01/2003 1 H 8 1 1/01/2003 1 L 2 2 1/01/2003 2 H 7 3 1/01/2003 2 L 10 4 1/12/2002 1 H 6 5 1/12/2002 1 L 12 6 1/12/2002 2 L 3 But neither of those have H and L as columns. Maybe a pivot table is a better idea. amount class H L Date id 1/01/2003 1 8.0 2.0 2 7.0 5.0 1/12/2002 1 6.0 6.0 2 NaN 3.0 Now we have H and L columns, but both the rows and columns are multi-index. Flatten the row index. Date id amount class H L 0 1/01/2003 1 8.0 2.0 1 1/01/2003 2 7.0 5.0 2 1/12/2002 1 6.0 6.0 3 1/12/2002 2 NaN 3.0 And flatten the column index. Date id H L 0 1/01/2003 1 8.0 2.0 1 1/01/2003 2 7.0 5.0 2 1/12/2002 1 6.0 6.0 3 1/12/2002 2 NaN 3.0
Pedroski55 likes this post
Reply
#9
More fun than Suduko on a Sunday morning!

Is this what you want?

I hope the experts here can make this shorter!

import pandas as pd

cols = ['Date', 'id', 'class', 'amount']
string = '''1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1'''

# given the above data, make a df
a = string.split('\n')
b = [c.split() for c in a]
df = pd.DataFrame(b, columns=cols)
# amount is a string, change to numeric
df["amount"] = pd.to_numeric(df["amount"])
# get all the Ls
dfL = df.loc[df['class'] == 'L'] .copy()
# rename amount to L_class since that is the desired output format
dfL = dfL.rename(columns={'amount': 'L_class'})
# get all Hs
dfH = df.loc[df['class'] == 'H'].copy()
# rename amount to H_class since that is the desired output format
dfH = dfH.rename(columns={'amount': 'H_class'})
# merge dfL and dfH
new = pd.merge(dfL, dfH, how='outer', left_on=['Date', 'id', "class"], right_on = ['Date', 'id', "class"]) # looks good
# do the groupby and add the numbers in L_class and H_class
HL = new.groupby(["Date", "id", "class"]).agg({"L_class": "sum", "H_class": "sum"}).copy()
Gives this:

Output:
HL L_class H_class Date id class 1/01/2003 1 H 0.0 8.0 L 2.0 0.0 2 H 0.0 7.0 L 10.0 0.0 1/12/2002 1 H 0.0 6.0 L 12.0 0.0 2 L 3.0 0.0
Reply
#10
import pandas as pd

data = {
    'Date': ['1/12/2002', '1/12/2002', '1/12/2002', '1/12/2002', '1/01/2003', '1/01/2003', '1/01/2003', '1/01/2003', '1/01/2003'],
    'id': [1, 1, 1, 2, 1, 1, 2, 2, 2],
    'class': ['L', 'H', 'L', 'L', 'L', 'H', 'H', 'L', 'L'],
    'amount': [5, 6, 7, 3, 2, 8, 7, 9, 1]
}

df = pd.DataFrame(data)
# Pivot the DataFrame
pivot_df = df.pivot_table(index=['Date', 'id'], columns='class', values='amount', aggfunc='sum', fill_value=0)
pivot_df.columns = [f'{col}_class' for col in pivot_df.columns]
# Reset index to get Date and id back as columns
pivot_df = pivot_df.reset_index()
print(pivot_df)
Output:
Date id H_class L_class 0 1/01/2003 1 8 2 1 1/01/2003 2 7 10 2 1/12/2002 1 6 12 3 1/12/2002 2 0 3
Pedroski55 likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  attempt to split values from within a dataframe column mbrown009 9 2,748 Jun-20-2024, 07:59 PM
Last Post: AdamHensley
  Putting column name to dataframe, can't work. jonah88888 2 2,022 Jun-18-2024, 09:19 PM
Last Post: AdamHensley
  concat 3 columns of dataframe to one column flash77 2 1,055 Oct-03-2023, 09:29 PM
Last Post: flash77
  HTML Decoder pandas dataframe column mbrown009 3 1,274 Sep-29-2023, 05:56 PM
Last Post: deanhystad
  New Dataframe Column Based on Several Conditions nb1214 1 1,925 Nov-16-2021, 10:52 PM
Last Post: jefsummers
  Setting the x-axis to a specific column in a dataframe devansing 0 2,146 May-23-2021, 12:11 AM
Last Post: devansing
Question [Solved] How to refer to dataframe column name based on a list lorensa74 1 2,429 May-17-2021, 07:02 AM
Last Post: lorensa74
Question Pandas - Creating additional column in dataframe from another column Azureaus 2 3,123 Jan-11-2021, 09:53 PM
Last Post: Azureaus
  Filter data based on a value from another dataframe column and create a file using lo pawanmtm 1 4,411 Jul-15-2020, 06:20 PM
Last Post: pawanmtm
  Pandas DataFrame and unmatched column sritsv19 0 3,148 Jul-07-2020, 12:52 PM
Last Post: sritsv19

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020