Python Forum
Monthly sales, standard deviation - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Monthly sales, standard deviation (/thread-28716.html)



Monthly sales, standard deviation - Lyle - Jul-31-2020

Hi, there. I am back to the knowledge well!

Using pandas I have a dataframe with Date, Item#, Qty and 12 months of data. I want to do a std dev calc to understand the monthly variability for each Item#.

The problem is for many Item#'s the sales are very sporadic: I may only sell Item XYZ in Feb and Mar, and zero the rest of the months of the year.

I am using this code to do the heavy lifting:

df.groupby('Item#').resample('M').sum()

For some Item#'s it is forcing zeros into the empty months (GOOD!), but in many cases it is just showing, say, two months of data, instead of 12, which then makes the std dev calculation incorrect.

Can someone help me understand why this erractic behaviour of the resample method? How can I workaround this problem?

Many thanks in advance for your help!


RE: Monthly sales, standard deviation - Lyle - Aug-07-2020

uh-oh. no responses. Was this a really dumb question? :(


RE: Monthly sales, standard deviation - deanhystad - Aug-07-2020

Maybe they are waiting for an example?


RE: Monthly sales, standard deviation - Lyle - Aug-07-2020

Right, makes sense. It's just tough to make a concise example b/c the data is so lengthy.

I did some more experimenting and I've figured out what's going on. Resample() will interpolate, but not extrapolate. So if I have sales of an Item in Jan and Dec, I will get 12 values, even if there is zero sales in Feb-thru-Nov. If I have sales in Mar and Jun only, but none in Jan-Feb or Jul-Dec, I will get 4 values (Mar, Apr, May, Jun).

So, because I have sales of something in every month of the year, my solution was to unstack() the dataframe after the resample(). That forced the resampled 12 months along the top as columns with a ton of NaN's. I filled the NaN's with zeros, then stack(), and presto, 12 values for every item.

Seems a little kludgy, maybe there's a better way?