Apr-22-2024, 01:14 PM
(Apr-22-2024, 03:18 AM)deanhystad Wrote: If speed is important, don’t loop and really don’t build data frames a row at a time. This code is what I expect to see when someone has a speed complaint
Yes I completely agree with you - actually the first pass at my dataframe I do with boolean indexing to get rid of erroneous values, but then it gets complicated when I need to get rid of edge cases. Unfortunately this is to deal with an edge case that is simply impossible to get rid of with boolean indexing. Fortunately this loop maybe only has to iterate 50-100 times in a dataset of about 1 million rows, because that's how much the edge case occurs so it's not too bad of a performance hit.
https://www.elitetrader.com/et/threads/p...ket.52398/
What I'm doing is identifying "late prints" in stock market tape and removing these as they completely mess with stop losses in backtesting. Late prints in themselves are an edge case - but single late prints are easy enough to ID with a boolean expression. But then the first edge case to the edge case are where you have multiple in a row where the price is the same - and you don't know how many rows of these multiples are coming. Then the edge case to the edge case to the edge case is where you have multiple in a row, for an unknown number of rows and the price is not the same (but could be the same for some, but more accurately not the same for all).