groupby on var with missing values error - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: groupby on var with missing values error (/thread-33575.html) |
groupby on var with missing values error - zenvega - May-07-2021 I am simulating credit card utilization data with some missing values. Then I want to create a dual axis plot. x axis - bins [Missing, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] y1 axis - counts y2 axis - mean here is my code: import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import random # create credit card utilization and cap it to 1. mu, sigma = 0.5, 0.15 # mean and standard deviation s = pd.DataFrame(np.random.normal(mu, sigma, 1000)).rename(columns={0: 'cc_util'}) s = pd.DataFrame(np.where(s['cc_util'] > 1, 1,s['cc_util'])).rename(columns={0: 'cc_util'}) # insert random nan (Null values) ix = [(row, col) for row in range(s.shape[0]) for col in range(s.shape[1])] for row, col in random.sample(ix, int(round(.1*len(ix)))): s.iat[row, col] = np.nan # create decile bin boundaries cut_bins = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] s['decile_grp'] = pd.cut(s['cc_util'], bins=cut_bins,labels=None).astype(str) summary['cc_util_mean'] = s.groupby(['decile_grp'])['cc_util'].mean() summary_count = pd.DataFrame(s.groupby(['decile_grp'],dropna=False)['cc_util'].count()).rename(columns={'cc_util': 'cc_util_count'}) summary_mean = pd.DataFrame(s.groupby(['decile_grp'],dropna=False)['cc_util'].mean()).rename(columns={'cc_util': 'cc_util_mean'}) result = pd.merge(summary_count, summary_mean, how="left",on='decile_grp') result.reset_index(inplace=True) fig, ax = plt.subplots(figsize=(15, 8)) sns.barplot(data=result, x='decile_grp', y='cc_util_count', palette="Blues_d") ax2 = ax.twinx() sns.lineplot(data=result, x='decile_grp', y='cc_util_mean', ax=ax2, color='tomato')However, I get the following table that doesn't make sense | decile\_grp | cc\_util\_mean | | ----------- | -------------- | | (0.0, 0.1 | NaN | | (0.1, 0.2 | NaN | | (0.2, 0.3 | NaN | | (0.3, 0.4 | NaN | | (0.4, 0.5 | NaN | | (0.5, 0.6 | NaN | | (0.6, 0.7 | NaN | | (0.7, 0.8 | NaN | | (0.8, 0.9 | NaN | | (0.9, 1.0 | NaN | |