I'm trying to create a pivot table in Python, but when I compare the sum of my count on my original dataframe with the sum of the count in the PivotTable it does not add up. I've tried filling NaN to see if this is not maybe the reason, but no luck. Below is my code as well as the output.
import pandas as pd import numpy as np # Ensure that missing values are pupulated for the class variables merged_df_stage['Reporting_Period'].fillna('Unknown', inplace=True) merged_df_stage['New_Customer'].fillna('Unknown', inplace=True) merged_df_stage['InceptionBand'].fillna('Unknown', inplace=True) merged_df_stage['PersistencyBand'].fillna('Unknown', inplace=True) merged_df_stage['Decision'].fillna('Unknown', inplace=True) merged_df_stage['Activation_Ind'].fillna('Unknown', inplace=True) merged_df_stage['Sale_Indicator'].fillna('Unknown', inplace=True) merged_df_stage['RPC'].fillna('Unknown', inplace=True) merged_df_stage['Prf1'].fillna('Unknown', inplace=True) merged_df_stage['Prf2'].fillna('Unknown', inplace=True) # Add additional count columns merged_df_stage['Count_val'] = 1 merged_df_stage['Count_val'] = merged_df_stage['Count_val'].fillna(0) merged_df_stage['Goods_Count'] = np.where(merged_df_stage['Prf2'] == '70: goods in 3M', 1, 0) merged_df_stage['Goods_Count'] = merged_df_stage['Goods_Count'].fillna(0) merged_df_stage['Bads_Count'] = np.where(merged_df_stage['Prf2'] == '30: bads in 3M', 1, 0) merged_df_stage['Bads_Count'] = merged_df_stage['Bads_Count'].fillna(0) merged_df_stage['Sale_Indicator_Count'] = np.where(merged_df_stage['Sale_Indicator'] == 'Y', 1, 0) merged_df_stage['Sale_Indicator_Count'] = merged_df_stage['Sale_Indicator_Count'].fillna(0) merged_df_stage['RPC_Count'] = np.where(merged_df_stage['RPC'] == 1, 1, 0) merged_df_stage['RPC_Count'] = merged_df_stage['RPC_Count'].fillna(0) # Calculate the sum of the 'Count' column count_sum = merged_df_stage['Count_val'].sum() print(f"Sum of Count: {count_sum}") # Create pivot table with counts on measure fields and additional counts pivot_table = pd.pivot_table(merged_df_stage, values=['Count_val', 'Goods_Count', 'Bads_Count', 'Sale_Indicator_Count', 'RPC_Count'], index=['Reporting_Period', 'New_Customer', 'InceptionBand', 'PersistencyBand', 'Decision', 'Activation_Ind', 'Sale_Indicator', 'RPC', 'Prf1', 'Prf2'], aggfunc={'Count_val': 'count', 'Goods_Count': 'sum', 'Bads_Count': 'sum', 'Sale_Indicator_Count': 'sum', 'RPC_Count': 'sum'}) # Reset index to convert the pivot table into a regular dataframe pivot_table.reset_index(inplace=True) # Rename the measure columns pivot_table.rename(columns={'Count_val': 'Count', 'Goods_Count': 'Goods_Count', 'Bads_Count': 'Bads_Count', 'Sale_Indicator_Count': 'Sale_Indicator_Count', 'RPC_Count': 'RPC_Count'}, inplace=True) # Testing Counts written to the pivot table to see that they add up to shape attributes: count_sum = pivot_table['Count'].sum() print(f"Sum of Pivot Count: {count_sum}")
Output:Sum of Count: 945947
Sum of Pivot Count: 53894
Larz60+ write Jun-29-2023, 02:59 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Fixed this time. Please use BBCode tags in future posts.
You can use icons at top of post area.
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Fixed this time. Please use BBCode tags in future posts.
You can use icons at top of post area.