Python Forum

Full Version: Monthly sales, standard deviation
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi, there. I am back to the knowledge well!

Using pandas I have a dataframe with Date, Item#, Qty and 12 months of data. I want to do a std dev calc to understand the monthly variability for each Item#.

The problem is for many Item#'s the sales are very sporadic: I may only sell Item XYZ in Feb and Mar, and zero the rest of the months of the year.

I am using this code to do the heavy lifting:

df.groupby('Item#').resample('M').sum()

For some Item#'s it is forcing zeros into the empty months (GOOD!), but in many cases it is just showing, say, two months of data, instead of 12, which then makes the std dev calculation incorrect.

Can someone help me understand why this erractic behaviour of the resample method? How can I workaround this problem?

Many thanks in advance for your help!
uh-oh. no responses. Was this a really dumb question? :(
Maybe they are waiting for an example?
Right, makes sense. It's just tough to make a concise example b/c the data is so lengthy.

I did some more experimenting and I've figured out what's going on. Resample() will interpolate, but not extrapolate. So if I have sales of an Item in Jan and Dec, I will get 12 values, even if there is zero sales in Feb-thru-Nov. If I have sales in Mar and Jun only, but none in Jan-Feb or Jul-Dec, I will get 4 values (Mar, Apr, May, Jun).

So, because I have sales of something in every month of the year, my solution was to unstack() the dataframe after the resample(). That forced the resampled 12 months along the top as columns with a ton of NaN's. I filled the NaN's with zeros, then stack(), and presto, 12 values for every item.

Seems a little kludgy, maybe there's a better way?