Python Forum
groupby on var with missing values error
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
groupby on var with missing values error
#1
I am simulating credit card utilization data with some missing values. Then I want to create a dual axis plot.

x axis - bins [Missing, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
y1 axis - counts
y2 axis - mean

here is my code:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import random

# create credit card utilization and cap it to 1.
mu, sigma = 0.5, 0.15 # mean and standard deviation
s = pd.DataFrame(np.random.normal(mu, sigma, 1000)).rename(columns={0: 'cc_util'})
s = pd.DataFrame(np.where(s['cc_util'] > 1, 1,s['cc_util'])).rename(columns={0: 'cc_util'})

# insert random nan (Null values)
ix = [(row, col) for row in range(s.shape[0]) for col in range(s.shape[1])]
for row, col in random.sample(ix, int(round(.1*len(ix)))):
    s.iat[row, col] = np.nan
# create decile bin boundaries
cut_bins = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
s['decile_grp'] = pd.cut(s['cc_util'], bins=cut_bins,labels=None).astype(str)
summary['cc_util_mean'] = s.groupby(['decile_grp'])['cc_util'].mean()
summary_count = pd.DataFrame(s.groupby(['decile_grp'],dropna=False)['cc_util'].count()).rename(columns={'cc_util': 'cc_util_count'})
summary_mean = pd.DataFrame(s.groupby(['decile_grp'],dropna=False)['cc_util'].mean()).rename(columns={'cc_util': 'cc_util_mean'})

result = pd.merge(summary_count, summary_mean, how="left",on='decile_grp')
result.reset_index(inplace=True)

fig, ax = plt.subplots(figsize=(15, 8))
sns.barplot(data=result, x='decile_grp', y='cc_util_count', palette="Blues_d")
ax2 = ax.twinx()
sns.lineplot(data=result, x='decile_grp', y='cc_util_mean', ax=ax2, color='tomato')
However, I get the following table that doesn't make sense
| decile\_grp | cc\_util\_mean |
| ----------- | -------------- |
| (0.0, 0.1 | NaN |
| (0.1, 0.2 | NaN |
| (0.2, 0.3 | NaN |
| (0.3, 0.4 | NaN |
| (0.4, 0.5 | NaN |
| (0.5, 0.6 | NaN |
| (0.6, 0.7 | NaN |
| (0.7, 0.8 | NaN |
| (0.8, 0.9 | NaN |
| (0.9, 1.0 | NaN |
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  GroupBy - Sum = Error [datetime64 type does not support sum operations] BSDevo 4 2,544 Oct-27-2023, 07:22 PM
Last Post: BSDevo
  Pandas + Groupby + Filter unique values JosepMaria 1 2,839 Jun-15-2020, 08:15 AM
Last Post: JosepMaria
  read_csv error and rows/columns missing karlito 9 5,228 Nov-11-2019, 06:48 AM
Last Post: karlito
  Filling in missing values melm0 4 5,229 Aug-09-2017, 03:59 PM
Last Post: radioactive9

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020