Python Forum
PivotTable help needed please
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
PivotTable help needed please
#1
I'm trying to create a pivot table in Python, but when I compare the sum of my count on my original dataframe with the sum of the count in the PivotTable it does not add up. I've tried filling NaN to see if this is not maybe the reason, but no luck. Below is my code as well as the output.


import pandas as pd
import numpy as np

# Ensure that missing values are pupulated for the class variables
merged_df_stage['Reporting_Period'].fillna('Unknown', inplace=True)
merged_df_stage['New_Customer'].fillna('Unknown', inplace=True)
merged_df_stage['InceptionBand'].fillna('Unknown', inplace=True)
merged_df_stage['PersistencyBand'].fillna('Unknown', inplace=True)
merged_df_stage['Decision'].fillna('Unknown', inplace=True)
merged_df_stage['Activation_Ind'].fillna('Unknown', inplace=True)
merged_df_stage['Sale_Indicator'].fillna('Unknown', inplace=True)
merged_df_stage['RPC'].fillna('Unknown', inplace=True)
merged_df_stage['Prf1'].fillna('Unknown', inplace=True)
merged_df_stage['Prf2'].fillna('Unknown', inplace=True)

# Add additional count columns
merged_df_stage['Count_val'] = 1
merged_df_stage['Count_val'] = merged_df_stage['Count_val'].fillna(0)

merged_df_stage['Goods_Count'] = np.where(merged_df_stage['Prf2'] == '70: goods in 3M', 1, 0)
merged_df_stage['Goods_Count'] = merged_df_stage['Goods_Count'].fillna(0)

merged_df_stage['Bads_Count'] = np.where(merged_df_stage['Prf2'] == '30: bads in 3M', 1, 0)
merged_df_stage['Bads_Count'] = merged_df_stage['Bads_Count'].fillna(0)

merged_df_stage['Sale_Indicator_Count'] = np.where(merged_df_stage['Sale_Indicator'] == 'Y', 1, 0)
merged_df_stage['Sale_Indicator_Count'] = merged_df_stage['Sale_Indicator_Count'].fillna(0)

merged_df_stage['RPC_Count'] = np.where(merged_df_stage['RPC'] == 1, 1, 0)
merged_df_stage['RPC_Count'] = merged_df_stage['RPC_Count'].fillna(0)

# Calculate the sum of the 'Count' column
count_sum = merged_df_stage['Count_val'].sum()
print(f"Sum of Count: {count_sum}")

# Create pivot table with counts on measure fields and additional counts
pivot_table = pd.pivot_table(merged_df_stage, 
                             values=['Count_val', 'Goods_Count', 'Bads_Count', 'Sale_Indicator_Count', 'RPC_Count'], 
                             index=['Reporting_Period', 'New_Customer', 'InceptionBand', 'PersistencyBand', 'Decision', 'Activation_Ind', 'Sale_Indicator', 'RPC', 'Prf1', 'Prf2'], 
                             aggfunc={'Count_val': 'count', 'Goods_Count': 'sum', 'Bads_Count': 'sum', 'Sale_Indicator_Count': 'sum', 'RPC_Count': 'sum'})

# Reset index to convert the pivot table into a regular dataframe
pivot_table.reset_index(inplace=True)

# Rename the measure columns
pivot_table.rename(columns={'Count_val': 'Count', 'Goods_Count': 'Goods_Count', 'Bads_Count': 'Bads_Count', 'Sale_Indicator_Count': 'Sale_Indicator_Count', 'RPC_Count': 'RPC_Count'}, inplace=True)

# Testing Counts written to the pivot table to see that they add up to shape attributes:
count_sum = pivot_table['Count'].sum()
print(f"Sum of Pivot Count: {count_sum}")
Output:
Sum of Count: 945947 Sum of Pivot Count: 53894
Larz60+ write Jun-29-2023, 02:59 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Fixed this time. Please use BBCode tags in future posts.
You can use icons at top of post area.
Reply
#2
Disregard
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to filter in a Pivottable Excel Bob333 0 1,646 Apr-14-2021, 11:55 AM
Last Post: Bob333

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020