I am simulating credit card utilization data with some missing values. Then I want to create a dual axis plot.
x axis - bins [Missing, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
y1 axis - counts
y2 axis - mean
here is my code:
| decile\_grp | cc\_util\_mean |
| ----------- | -------------- |
| (0.0, 0.1 | NaN |
| (0.1, 0.2 | NaN |
| (0.2, 0.3 | NaN |
| (0.3, 0.4 | NaN |
| (0.4, 0.5 | NaN |
| (0.5, 0.6 | NaN |
| (0.6, 0.7 | NaN |
| (0.7, 0.8 | NaN |
| (0.8, 0.9 | NaN |
| (0.9, 1.0 | NaN |
x axis - bins [Missing, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
y1 axis - counts
y2 axis - mean
here is my code:
import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import random # create credit card utilization and cap it to 1. mu, sigma = 0.5, 0.15 # mean and standard deviation s = pd.DataFrame(np.random.normal(mu, sigma, 1000)).rename(columns={0: 'cc_util'}) s = pd.DataFrame(np.where(s['cc_util'] > 1, 1,s['cc_util'])).rename(columns={0: 'cc_util'}) # insert random nan (Null values) ix = [(row, col) for row in range(s.shape[0]) for col in range(s.shape[1])] for row, col in random.sample(ix, int(round(.1*len(ix)))): s.iat[row, col] = np.nan # create decile bin boundaries cut_bins = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] s['decile_grp'] = pd.cut(s['cc_util'], bins=cut_bins,labels=None).astype(str) summary['cc_util_mean'] = s.groupby(['decile_grp'])['cc_util'].mean() summary_count = pd.DataFrame(s.groupby(['decile_grp'],dropna=False)['cc_util'].count()).rename(columns={'cc_util': 'cc_util_count'}) summary_mean = pd.DataFrame(s.groupby(['decile_grp'],dropna=False)['cc_util'].mean()).rename(columns={'cc_util': 'cc_util_mean'}) result = pd.merge(summary_count, summary_mean, how="left",on='decile_grp') result.reset_index(inplace=True) fig, ax = plt.subplots(figsize=(15, 8)) sns.barplot(data=result, x='decile_grp', y='cc_util_count', palette="Blues_d") ax2 = ax.twinx() sns.lineplot(data=result, x='decile_grp', y='cc_util_mean', ax=ax2, color='tomato')However, I get the following table that doesn't make sense
| decile\_grp | cc\_util\_mean |
| ----------- | -------------- |
| (0.0, 0.1 | NaN |
| (0.1, 0.2 | NaN |
| (0.2, 0.3 | NaN |
| (0.3, 0.4 | NaN |
| (0.4, 0.5 | NaN |
| (0.5, 0.6 | NaN |
| (0.6, 0.7 | NaN |
| (0.7, 0.8 | NaN |
| (0.8, 0.9 | NaN |
| (0.9, 1.0 | NaN |