Python Forum

Full Version: Pandas .rolling() with some calculations inside
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hey guys

So in short, my Pandas dataframe is something like this:

https://www.dropbox.com/s/ja6kn0f55599xul/test.csv?dl=1

I want to pull out something like this:

a = df.index - df.index.where(df.candle == df.candle -3)[0]
df['test'] = df.bid.rolling(int(a)).mean()
So basically, I want to look back X amount of periods with .rolling(), but I don't know exactly how many therefore I need to calculate it based on df.candle values. I want to look 3 candles back to be more precise and calculate the index difference to use inside .rolling()

I though something like the above piece of code would work, but no, it's giving me errors. I also tried a few other approaches to no avail so end up a bit lost.. The full csv dataframe is here: https://www.dropbox.com/s/9p3kt338bj20yd...8.csv?dl=1
Any suggestions or help will be very much appreciated, thanks! :)
Anyone? I actually can't get an answer for this on Stack Overflow and few other places for a week now. Is it that difficult?
What error do you get? I don't know pandas, but I can at least try to poke it and see what happens.
Are you telling about applying a rolling window of variable width? If so, Pandas doesn't support such windows, except ones based on datetime-like columns. You can try to 1) write all logic in a for loop, e.g. usingdf.iterrows(); 2) Use Cython, df.values is numpy array, Cython supports numpy arrays; (that would be most efficient) 3) make some magic and turn you condition on selecting window size into datetime-like column and apply pandas rolling window (I am not sure if this possible);
It's interesting how pandas with myriads of features sometimes doesn't have solutions for simple problems. No one can answer this simple question on the internet. I've been trying to find a work around for this for a couple of weeks already, the whole process is so unintuitive and bulky that makes me cringe. Brrr! I think the only elegant solution would be to go and learn Cython!?
Cython supports numpy arrays and has examples how to work with such array efficiently. You wouldn't have problems with implementing your algorithm, I think. (If you don't need to do the same work repeatedly, you probably can implement and run a long running for-loop in Python, and it probably ends on your (1GB) dataset faster than two weeks or time spent for learning Cython :)