![]() |
Create new column in dataframe - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Create new column in dataframe (/thread-42308.html) |
Create new column in dataframe - Scott - Jun-14-2024 Hi everyone, I have a dataframe I am trying to manipulate to create two new columns and drop 1 based on the sum function the does a conditional group by and I cannot find any example online. Below is the dataframe I have: Date id class amount 1/12/2002 1 L 5 1/12/2002 1 H 6 1/12/2002 1 L 7 1/12/2002 2 L 3 1/01/2003 1 L 2 1/01/2003 1 H 8 1/01/2003 2 H 7 1/01/2003 2 L 9 1/01/2003 2 L 1 I'm looking to create a new column, L_class that sums the amount grouped by date and ID when class = 'L' and then do the same thing for H_class when class = 'H' and then drop the class field. the data frame should look like this. Date id L_class H_class 1/12/2002 1 12 6 1/12/2002 2 3 1/01/2003 1 2 8 1/01/2003 2 10 7 I've tried googling how to do it and I have a function that does the calculation but not one that can create a new variable based on the condition. Does anyone have any ideas or know of an example online? Thanks for any help RE: Create new column in dataframe - Larz60+ - Jun-14-2024 start here: create new columns derived from existing columns RE: Create new column in dataframe - Scott - Jun-16-2024 Thank you @Larz60+ I know how to create a new column its more trying to do everything at once in a clean manner. I've decided to break it down into steps and transpose the columns out I had to pivot the column out and then left join the pivots back onto the dataframe then do a multi variable groupby sum with transform to create a new variable. I then had to drop all the columns I did not need and do a drop duplicates. It was not pretty but it worked. RE: Create new column in dataframe - Larz60+ - Jun-16-2024 post your code (using bbcode tags) so we can help. RE: Create new column in dataframe - Scott - Jun-21-2024 Apologies for the delay, it is on my work computer and they won't let me send python files from work to home computer so I had to retype it. I have imported a CSV on my local drive but I've included the URL to the dataset so you can download it and just update that piece of code. import numpy as np import pandas as pd import matplotlib.pyplot as plt #https://opendata-nzta.opendata.arcgis.com/datasets/NZTA::tms-daily-traffic-counts-api/explore?showTable=true path = r'C:\Users\SGrah\OneDrive\Documents\Python Scripts\Data\Economics data\TMS_Telemetry_Sites_4135033762041494366.csv' df = pd.read_csv(path, header=0, usecols=[0,1,2,3,4,5,6,7,8,9], index_col=[0]) df.head() df2['Start Date'] =pd.to_datetime(df['Start Date']) df2['Day'] = df2['Start Date'].dt.day df2['Month'] = df2['Start Date'].dt.month df2['Year'] = df2['Start Date'].dt.year df2['Date'] = pd.to_datetime(df['Year'].astype(str) + df2['Month'].astype(str), format='%Y%m') df2.drop(['Day','Month','Year'], axis=1, inplace=True) df2['Date'] = pd.to_datetime(df2['Date'], format='%Y-%m-%d') df.head() trans = df2.pivot(columns='Class Weight', values='Traffic Count') print(trans) df3 = df2.merge(trans, how='left', on='OBJECTID') df3['Heavy_sum'] = df3.groupby(['Site Reference', 'Date'])['Heavy'].transform('sum') df3['Light_sum'] = df3.groupby(['Site Reference', 'Date'])['Light'].transform('sum') df3.head()Also is there a way to paste tables from excel neatly into this forum (like with the code above)? If you look at my first post in this thread the tables are hard to read. Thanks heaps RE: Create new column in dataframe - Scott - Jun-21-2024 post error RE: Create new column in dataframe - Larz60+ - Jun-21-2024 Scott Wrote:Also is there a way to paste tables from excel neatly into this forum (like with the code above)? If you look at my first post in this thread the tables are hard to read.You can use the output tags, (export excel data as csv and post that). RE: Create new column in dataframe - deanhystad - Jun-21-2024 I cannot turn this: Into this: But I can get close:import pandas as pd from io import StringIO data = """Date id class amount 1/12/2002 1 L 5 1/12/2002 1 H 6 1/12/2002 1 L 7 1/12/2002 2 L 3 1/01/2003 1 L 2 1/01/2003 1 H 8 1/01/2003 2 H 7 1/01/2003 2 L 9 1/01/2003 2 L 1 """ df = pd.read_csv(StringIO(data), sep=" ") print("It is easy to go from this", df, sep="\n") df2 = df.groupby(["Date", "id", "class"]).agg({"amount": "sum"}) print("\nTo this", df2, sep="\n") print("\nor this.", df2.reset_index(), sep="\n") df3 = pd.pivot_table(df, index=["Date", "id"], columns=["class"], values=["amount"]) print("\nBut neither of those have H and L as columns. Maybe a pivot table is a better idea.", df3, sep="\n") df4 = df3.reset_index() print( "\nNow we have H and L columns, but both the rows and columns are multi-index. Flatten the row index.", df4, sep="\n", ) df4.columns = ["Date", "id", "H", "L"] print("\nAnd flatten the column index.", df4, sep="\n")
RE: Create new column in dataframe - Pedroski55 - Jun-23-2024 More fun than Suduko on a Sunday morning! Is this what you want? I hope the experts here can make this shorter! import pandas as pd cols = ['Date', 'id', 'class', 'amount'] string = '''1/12/2002 1 L 5 1/12/2002 1 H 6 1/12/2002 1 L 7 1/12/2002 2 L 3 1/01/2003 1 L 2 1/01/2003 1 H 8 1/01/2003 2 H 7 1/01/2003 2 L 9 1/01/2003 2 L 1''' # given the above data, make a df a = string.split('\n') b = [c.split() for c in a] df = pd.DataFrame(b, columns=cols) # amount is a string, change to numeric df["amount"] = pd.to_numeric(df["amount"]) # get all the Ls dfL = df.loc[df['class'] == 'L'] .copy() # rename amount to L_class since that is the desired output format dfL = dfL.rename(columns={'amount': 'L_class'}) # get all Hs dfH = df.loc[df['class'] == 'H'].copy() # rename amount to H_class since that is the desired output format dfH = dfH.rename(columns={'amount': 'H_class'}) # merge dfL and dfH new = pd.merge(dfL, dfH, how='outer', left_on=['Date', 'id', "class"], right_on = ['Date', 'id', "class"]) # looks good # do the groupby and add the numbers in L_class and H_class HL = new.groupby(["Date", "id", "class"]).agg({"L_class": "sum", "H_class": "sum"}).copy()Gives this:
RE: Create new column in dataframe - snippsat - Jun-23-2024 import pandas as pd data = { 'Date': ['1/12/2002', '1/12/2002', '1/12/2002', '1/12/2002', '1/01/2003', '1/01/2003', '1/01/2003', '1/01/2003', '1/01/2003'], 'id': [1, 1, 1, 2, 1, 1, 2, 2, 2], 'class': ['L', 'H', 'L', 'L', 'L', 'H', 'H', 'L', 'L'], 'amount': [5, 6, 7, 3, 2, 8, 7, 9, 1] } df = pd.DataFrame(data) # Pivot the DataFrame pivot_df = df.pivot_table(index=['Date', 'id'], columns='class', values='amount', aggfunc='sum', fill_value=0) pivot_df.columns = [f'{col}_class' for col in pivot_df.columns] # Reset index to get Date and id back as columns pivot_df = pivot_df.reset_index() print(pivot_df)
|