Create new column in dataframe

Scott · Jun-14-2024, 07:04 AM

Hi everyone,

I have a dataframe I am trying to manipulate to create two new columns and drop 1 based on the sum function the does a conditional group by and I cannot find any example online. Below is the dataframe I have:

Date id class amount
1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1

I'm looking to create a new column, L_class that sums the amount grouped by date and ID when class = 'L' and then do the same thing for H_class when class = 'H' and then drop the class field. the data frame should look like this.

Date id L_class H_class
1/12/2002 1 12 6
1/12/2002 2 3
1/01/2003 1 2 8
1/01/2003 2 10 7

I've tried googling how to do it and I have a function that does the calculation but not one that can create a new variable based on the condition. Does anyone have any ideas or know of an example online?

Thanks for any help

**Larz60+** · Jun-14-2024, 07:23 AM

start here:
create new columns derived from existing columns

Scott · Jun-16-2024, 10:40 PM

Thank you @Larz60+

I know how to create a new column its more trying to do everything at once in a clean manner. I've decided to break it down into steps and transpose the columns out

I had to pivot the column out and then left join the pivots back onto the dataframe then do a multi variable groupby sum with transform to create a new variable. I then had to drop all the columns I did not need and do a drop duplicates. It was not pretty but it worked.

**Larz60+** · Jun-16-2024, 11:47 PM

post your code (using bbcode tags) so we can help.

Scott · (This post was last modified: Jun-21-2024, 05:21 AM by Scott.)

Apologies for the delay, it is on my work computer and they won't let me send python files from work to home computer so I had to retype it.

I have imported a CSV on my local drive but I've included the URL to the dataset so you can download it and just update that piece of code.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#https://opendata-nzta.opendata.arcgis.com/datasets/NZTA::tms-daily-traffic-counts-api/explore?showTable=true
path = r'C:\Users\SGrah\OneDrive\Documents\Python Scripts\Data\Economics data\TMS_Telemetry_Sites_4135033762041494366.csv' 

df = pd.read_csv(path, header=0, usecols=[0,1,2,3,4,5,6,7,8,9], index_col=[0])
df.head()

df2['Start Date'] =pd.to_datetime(df['Start Date'])

df2['Day'] = df2['Start Date'].dt.day
df2['Month'] = df2['Start Date'].dt.month
df2['Year'] = df2['Start Date'].dt.year

df2['Date'] = pd.to_datetime(df['Year'].astype(str) +
                                        df2['Month'].astype(str), format='%Y%m')

df2.drop(['Day','Month','Year'], axis=1, inplace=True)
df2['Date'] = pd.to_datetime(df2['Date'], format='%Y-%m-%d')
df.head()

trans = df2.pivot(columns='Class Weight', values='Traffic Count')
print(trans)

df3 = df2.merge(trans, how='left', on='OBJECTID')

df3['Heavy_sum'] = df3.groupby(['Site Reference', 'Date'])['Heavy'].transform('sum')
df3['Light_sum'] = df3.groupby(['Site Reference', 'Date'])['Light'].transform('sum')
df3.head()

Also is there a way to paste tables from excel neatly into this forum (like with the code above)? If you look at my first post in this thread the tables are hard to read.

Thanks heaps

Scott · (This post was last modified: Jun-21-2024, 05:22 AM by Scott.)

post error

**Larz60+** · Jun-21-2024, 08:03 PM

Scott Wrote:Also is there a way to paste tables from excel neatly into this forum (like with the code above)? If you look at my first post in this thread the tables are hard to read.

You can use the output tags, (export excel data as csv and post that).

**deanhystad** · Jun-21-2024, 10:36 PM

I cannot turn this:

Output:Date id class amount
1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1

Into this:

Output:Date id L_class H_class
1/12/2002 1 12 6
1/12/2002 2 3
1/01/2003 1 2 8
1/01/2003 2 10 7

But I can get close:

import pandas as pd
from io import StringIO


data = """Date id class amount
1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1
"""
df = pd.read_csv(StringIO(data), sep=" ")
print("It is easy to go from this", df, sep="\n")

df2 = df.groupby(["Date", "id", "class"]).agg({"amount": "sum"})
print("\nTo this", df2, sep="\n")

print("\nor this.", df2.reset_index(), sep="\n")

df3 = pd.pivot_table(df, index=["Date", "id"], columns=["class"], values=["amount"])
print("\nBut neither of those have H and L as columns.  Maybe a pivot table is a better idea.", df3, sep="\n")

df4 = df3.reset_index()
print(
    "\nNow we have H and L columns, but both the rows and columns are multi-index.  Flatten the row index.",
    df4,
    sep="\n",
)

df4.columns = ["Date", "id", "H", "L"]
print("\nAnd flatten the column index.", df4, sep="\n")

Output:It is easy to go from this
        Date  id class  amount
0  1/12/2002   1     L       5
1  1/12/2002   1     H       6
2  1/12/2002   1     L       7
3  1/12/2002   2     L       3
4  1/01/2003   1     L       2
5  1/01/2003   1     H       8
6  1/01/2003   2     H       7
7  1/01/2003   2     L       9
8  1/01/2003   2     L       1

To this
                    amount
Date      id class
1/01/2003 1  H           8
             L           2
          2  H           7
             L          10
1/12/2002 1  H           6
             L          12
          2  L           3

or this.
        Date  id class  amount
0  1/01/2003   1     H       8
1  1/01/2003   1     L       2
2  1/01/2003   2     H       7
3  1/01/2003   2     L      10
4  1/12/2002   1     H       6
5  1/12/2002   1     L      12
6  1/12/2002   2     L       3

But neither of those have H and L as columns.  Maybe a pivot table is a better idea.
             amount
class             H    L
Date      id
1/01/2003 1     8.0  2.0
          2     7.0  5.0
1/12/2002 1     6.0  6.0
          2     NaN  3.0

Now we have H and L columns, but both the rows and columns are multi-index.  Flatten the row index.
            Date id amount
class                    H    L
0      1/01/2003  1    8.0  2.0
1      1/01/2003  2    7.0  5.0
2      1/12/2002  1    6.0  6.0
3      1/12/2002  2    NaN  3.0

And flatten the column index.
        Date  id    H    L
0  1/01/2003   1  8.0  2.0
1  1/01/2003   2  7.0  5.0
2  1/12/2002   1  6.0  6.0
3  1/12/2002   2  NaN  3.0

Pedroski55 · Jun-23-2024, 07:27 AM

More fun than Suduko on a Sunday morning!

Is this what you want?

I hope the experts here can make this shorter!

import pandas as pd

cols = ['Date', 'id', 'class', 'amount']
string = '''1/12/2002 1 L 5
1/12/2002 1 H 6
1/12/2002 1 L 7
1/12/2002 2 L 3
1/01/2003 1 L 2
1/01/2003 1 H 8
1/01/2003 2 H 7
1/01/2003 2 L 9
1/01/2003 2 L 1'''

# given the above data, make a df
a = string.split('\n')
b = [c.split() for c in a]
df = pd.DataFrame(b, columns=cols)
# amount is a string, change to numeric
df["amount"] = pd.to_numeric(df["amount"])
# get all the Ls
dfL = df.loc[df['class'] == 'L'] .copy()
# rename amount to L_class since that is the desired output format
dfL = dfL.rename(columns={'amount': 'L_class'})
# get all Hs
dfH = df.loc[df['class'] == 'H'].copy()
# rename amount to H_class since that is the desired output format
dfH = dfH.rename(columns={'amount': 'H_class'})
# merge dfL and dfH
new = pd.merge(dfL, dfH, how='outer', left_on=['Date', 'id', "class"], right_on = ['Date', 'id', "class"]) # looks good
# do the groupby and add the numbers in L_class and H_class
HL = new.groupby(["Date", "id", "class"]).agg({"L_class": "sum", "H_class": "sum"}).copy()

Gives this:

Output:HL
                    L_class  H_class
Date      id class                  
1/01/2003 1  H          0.0      8.0
             L          2.0      0.0
          2  H          0.0      7.0
             L         10.0      0.0
1/12/2002 1  H          0.0      6.0
             L         12.0      0.0
          2  L          3.0      0.0

***snippsat*** · (This post was last modified: Jun-23-2024, 08:52 PM by snippsat.)

import pandas as pd

data = {
    'Date': ['1/12/2002', '1/12/2002', '1/12/2002', '1/12/2002', '1/01/2003', '1/01/2003', '1/01/2003', '1/01/2003', '1/01/2003'],
    'id': [1, 1, 1, 2, 1, 1, 2, 2, 2],
    'class': ['L', 'H', 'L', 'L', 'L', 'H', 'H', 'L', 'L'],
    'amount': [5, 6, 7, 3, 2, 8, 7, 9, 1]
}

df = pd.DataFrame(data)
# Pivot the DataFrame
pivot_df = df.pivot_table(index=['Date', 'id'], columns='class', values='amount', aggfunc='sum', fill_value=0)
pivot_df.columns = [f'{col}_class' for col in pivot_df.columns]
# Reset index to get Date and id back as columns
pivot_df = pivot_df.reset_index()
print(pivot_df)

Output:        Date  id  H_class  L_class
0  1/01/2003   1        8        2
1  1/01/2003   2        7       10
2  1/12/2002   1        6       12
3  1/12/2002   2        0        3

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Create dataframe from the unique data of two dataframes	Calab	6	1,437	Mar-02-2025, 01:51 PM Last Post: Pedroski55
	Find duplicates in a pandas dataframe list column on other rows	Calab	2	2,569	Sep-18-2024, 07:38 PM Last Post: Calab
	Find strings by index from a list of indexes in a different Pandas dataframe column	Calab	3	1,842	Aug-26-2024, 04:52 PM Last Post: Calab
	attempt to split values from within a dataframe column	mbrown009	9	6,414	Jun-20-2024, 07:59 PM Last Post: AdamHensley
	Putting column name to dataframe, can't work.	jonah88888	2	3,421	Jun-18-2024, 09:19 PM Last Post: AdamHensley
	concat 3 columns of dataframe to one column	flash77	2	2,282	Oct-03-2023, 09:29 PM Last Post: flash77
	HTML Decoder pandas dataframe column	mbrown009	3	2,902	Sep-29-2023, 05:56 PM Last Post: deanhystad
	New Dataframe Column Based on Several Conditions	nb1214	1	2,645	Nov-16-2021, 10:52 PM Last Post: jefsummers
	Setting the x-axis to a specific column in a dataframe	devansing	0	2,717	May-23-2021, 12:11 AM Last Post: devansing
	[Solved] How to refer to dataframe column name based on a list	lorensa74	1	3,127	May-17-2021, 07:02 AM Last Post: lorensa74

Create new column in dataframe

User Panel Messages

Announcements