Nov-22-2022, 01:22 PM
I cannot figure out how to speed up the following problem by vectorising.
I have a big (millions of rows) Pandas DataFrame of price changes in time. I need to set a signal of +1 if price goes up by at least $1, and -1 if price goes down by at least $1. I set these signals in a column
I am not comparing current row to previous row only, I start with value in the first row and go over rows until the price moved by $1 either direction relative to first row, then my new starting point is that row and I compare subsequent rows relative to that row, etc. If I compared current row to previous row only, then this would be easy to do with
At the moment I use
See attached image for an example.
I have a big (millions of rows) Pandas DataFrame of price changes in time. I need to set a signal of +1 if price goes up by at least $1, and -1 if price goes down by at least $1. I set these signals in a column
Signal
. I am not comparing current row to previous row only, I start with value in the first row and go over rows until the price moved by $1 either direction relative to first row, then my new starting point is that row and I compare subsequent rows relative to that row, etc. If I compared current row to previous row only, then this would be easy to do with
np.where
and df.shift()
. I start with price from first row and iterate through rows until the price moved by $1 or more either direction. Then I take that new price from that row and iterate through subsequent rows until price moved by at least $1 relative to it, etc.At the moment I use
iterrows()
with two if
statements if the price goes up or down by $1, and a variable that saves the price from the row we are comparing new prices with. However this is very slow.See attached image for an example.