Python Forum
groupby on var with missing values error - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: groupby on var with missing values error (/thread-33575.html)



groupby on var with missing values error - zenvega - May-07-2021

I am simulating credit card utilization data with some missing values. Then I want to create a dual axis plot.

x axis - bins [Missing, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
y1 axis - counts
y2 axis - mean

here is my code:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import random

# create credit card utilization and cap it to 1.
mu, sigma = 0.5, 0.15 # mean and standard deviation
s = pd.DataFrame(np.random.normal(mu, sigma, 1000)).rename(columns={0: 'cc_util'})
s = pd.DataFrame(np.where(s['cc_util'] > 1, 1,s['cc_util'])).rename(columns={0: 'cc_util'})

# insert random nan (Null values)
ix = [(row, col) for row in range(s.shape[0]) for col in range(s.shape[1])]
for row, col in random.sample(ix, int(round(.1*len(ix)))):
    s.iat[row, col] = np.nan
# create decile bin boundaries
cut_bins = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
s['decile_grp'] = pd.cut(s['cc_util'], bins=cut_bins,labels=None).astype(str)
summary['cc_util_mean'] = s.groupby(['decile_grp'])['cc_util'].mean()
summary_count = pd.DataFrame(s.groupby(['decile_grp'],dropna=False)['cc_util'].count()).rename(columns={'cc_util': 'cc_util_count'})
summary_mean = pd.DataFrame(s.groupby(['decile_grp'],dropna=False)['cc_util'].mean()).rename(columns={'cc_util': 'cc_util_mean'})

result = pd.merge(summary_count, summary_mean, how="left",on='decile_grp')
result.reset_index(inplace=True)

fig, ax = plt.subplots(figsize=(15, 8))
sns.barplot(data=result, x='decile_grp', y='cc_util_count', palette="Blues_d")
ax2 = ax.twinx()
sns.lineplot(data=result, x='decile_grp', y='cc_util_mean', ax=ax2, color='tomato')
However, I get the following table that doesn't make sense
| decile\_grp | cc\_util\_mean |
| ----------- | -------------- |
| (0.0, 0.1 | NaN |
| (0.1, 0.2 | NaN |
| (0.2, 0.3 | NaN |
| (0.3, 0.4 | NaN |
| (0.4, 0.5 | NaN |
| (0.5, 0.6 | NaN |
| (0.6, 0.7 | NaN |
| (0.7, 0.8 | NaN |
| (0.8, 0.9 | NaN |
| (0.9, 1.0 | NaN |