When resample from monthly data to quarterly, I want my last value NaN to remain as NaN. How should I tweak my code?
Thank you
df=pd.read_excel(input_file, sheet_name='Sheet1', usecols='A:D', na_values='ND', index_col=0, header=0)
df.index.names = ['Period']
df.index = pd.to_datetime(df.index)
q0= pd.Series(df['HS6P1'], index=df.index)
m1 = q0.resample('Q').sum()
Current Output
Period
1989-03-31 212.7
1989-06-30 302.1
1989-09-30 272.1
1989-12-31 163.9
Desired Output
Period
1989-03-31 212.7
1989-06-30 302.1
1989-09-30 272.1
1989-12-31 NaN
Try
m1 = q0.resample('Q').sum(skipna=False)
(Dec-07-2022, 01:50 PM)deanhystad Wrote: [ -> ]Try
m1 = q0.resample('Q').sum(skipna=False)
I get error:
Error:
UnsupportedFunctionCall: numpy operations are not valid with resample. Use .resample(...).sum() instead
Thanks in advance if u could help!
I think that answers your question. To use resample().sum() your only choices is to ignore NaN's. Looks like you'll have to do most of the work yourself.
resample() is really just a special version of groupby(). The primary difference that resample only groups by date/time. Calling sequence.resample() returns a DatetimeIndexResampler object which you can use to access the groups. Each group has a timestamp index and a series of values. The series can be summed, and when summing a series you can set skipna=False.
import pandas as pd
from numpy import nan
series = pd.Series(range(6), index=pd.date_range('1/1/2013', periods=6, freq='T'))
series[5] = nan
groups = series.resample('2T')
for x in groups:
print(x[1].sum(skipna=False))
Output:
1.0
5.0
nan
Using this info it is easy to write a resampler that doesn't ignore NaN's.
import pandas as pd
import numpy as np
def no_skip_resampler(series, period):
groups = series.resample(period)
sums = [x[1].sum(skipna=False) for x in groups]
return pd.Series(sums, groups.indices)
series = pd.Series(range(6), index=pd.date_range('1/1/2013', periods=6, freq='T'))
series[5] = np.nan
print("Resampled")
print(series.resample('2T').sum())
print("\n Reconstructed")
print(no_skip_resampler(series, '2T'))
Output:
Resampled
2013-01-01 00:00:00 1.0
2013-01-01 00:02:00 5.0
2013-01-01 00:04:00 4.0
Freq: 2T, dtype: float64
Reconstructed
2013-01-01 00:00:00 1.0
2013-01-01 00:02:00 5.0
2013-01-01 00:04:00 NaN
dtype: float64