Posts: 279
Threads: 107
Joined: Aug 2019
My backtester program iterates down the rows of a .csv file. In order to backtest a strategy, over 99.5% of the rows correspond to unused options and should be skipped.
In order to choose the correct rows, I have a code block that is to be executed if three criteria are met: A, B, C. I'm trying to think about the most efficient way of doing this.
I have currently coded this as nested if statements:
if A:
if B:
if C:
<instructions> With this approach, it seems to me the most efficient approach would be to order this from most to least restrictive. For example, let's say the data file has 1000 rows and five share all three criteria (and will ultimately be processed). Consider two cases of criteria distribution (for lack of a better term): the first (second) being such that 50 (200) rows have A, 100 (100) rows have B, and 200 (50) rows have C. Coding as shown in the first case would minimize the number of rows to evaluate and is therefore most efficient. Coding as shown for the second case would mean L2 has to evaluate 200 rows rather than 50 (first case) and likely more rows that remain by L3 (compared to the first case). That amounts to lower efficiency.
Do you agree?
Finally, what are the implications of coding as above compared to a compound AND statement (e.g. if A and B and C)? I seem to think I've run into problems with the latter as it doesn't work in Python like I think it will. I may be wrong about that.
Posts: 6,780
Threads: 20
Joined: Feb 2020
Without knowing A, B and C it is difficult to answer your question. It might be possible to make a hashable result, and if you can do that, you could use a dictionary to select a which function to call.
Maybe you can use a case statement. Again I cannot sa without knowing more about A, B, and C.
It is also difficult to see if "and" will lead to confusion. Python and is a little different. I think it is very useful, but it is not C's &&.
Several if statements can indicate that there are better ways to solve the problem.
Posts: 279
Threads: 107
Joined: Aug 2019
(Apr-28-2022, 02:21 PM)deanhystad Wrote: Without knowing A, B and C it is difficult to answer your question. It might be possible to make a hashable result, and if you can do that, you could use a dictionary to select a which function to call.
Maybe you can use a case statement. Again I cannot sa without knowing more about A, B, and C.
It is also difficult to see if "and" will lead to confusion. Python and is a little different. I think it is very useful, but it is not C's &&.
Several if statements can indicate that there are better ways to solve the problem.
I'm trying to think why the specific content of A, B, and C matter. They are logical statements to be evaluated. Maybe the complexity of the statements matter? That would make sense. That is, not only the number of rows for which A, B, and C are True but also how many elements go into the truth statements themselves?
I don't know where hashable or dictionaries enter into this.
This shows A, B, and C:
bf = open(file, "r")
for line in bf:
datalist = line.split(",")
if control_flag == 'fl':
if int(float(datalist[2])) >= mte * 30 and int(float(datalist[2])) < (((mte + 1) * 30) + 5):
if float(datalist[6]) % 10 == 0:
if float(datalist[6]) > float(datalist[3]) and float(datalist[6]) % 10 == 0 and float(datalist[6]) - float(datalist[3]) < 11:
<instructions>
if control_flag == 'fs':
<...> Having me look closer does make me realize B is included in C so that's redundant. A and C are also AND statements. All should be true in order for execution of this branch to proceed.
Posts: 6,780
Threads: 20
Joined: Feb 2020
Apr-28-2022, 04:11 PM
(This post was last modified: Apr-28-2022, 04:12 PM by deanhystad.)
Python can use anything in an if statement, not just boolean values or expressions that have a boolean result. A Python "and" or "or" may result in a non-boolean result, like a list or a string. That is why the nature of A, B and C are important in knowing how they can be used.
You could use the new (python 3.10) match statement
match control_flag:
case "f1":
# more tests
case "fs":
# more tests If you are worried about efficiency you should not do lots of unnecessary conversions . Why int(float(datalist[2]))? You can compare a float(datalist[2]) against an int mte * 30. And I don't know if Python's code generator is smart enough to not evaluate float(datalist[6] thee times in this expression.
if float(datalist[6]) > float(datalist[3]) and float(datalist[6]) % 10 == 0 and float(datalist[6]) - float(datalist[3]) < 11:
Posts: 4,784
Threads: 76
Joined: Jan 2018
You could compute once for all
mtelb, mteub = mte * 30, (mte + 1) * 30 + 5 then use
if mtelb <= int(float(datalist[2])) < mteub: ... The
if float(datalist[6]) % 10 == 0 is questionable because usually one does not compare equality of floating point values with 0. Is it really what you want?
Posts: 279
Threads: 107
Joined: Aug 2019
(Apr-28-2022, 04:11 PM)deanhystad Wrote: Python can use anything in an if statement, not just boolean values or expressions that have a boolean result. A Python "and" or "or" may result in a non-boolean result, like a list or a string. That is why the nature of A, B and C are important in knowing how they can be used.
You could use the new (python 3.10) match statement
match control_flag:
case "f1":
# more tests
case "fs":
# more tests If you are worried about efficiency you should not do lots of unnecessary conversions . Why int(float(datalist[2]))? You can compare a float(datalist[2]) against an int mte * 30. And I don't know if Python's code generator is smart enough to not evaluate float(datalist[6] thee times in this expression.
if float(datalist[6]) > float(datalist[3]) and float(datalist[6]) % 10 == 0 and float(datalist[6]) - float(datalist[3]) < 11:
Thanks for the explanation.
My thought on those conversions was to eliminate the decimal portion of some of the output generated. Some fields of the .csv file couldn't be converted by int() directly, though. I therefore used int(float()).
Posts: 279
Threads: 107
Joined: Aug 2019
(Apr-28-2022, 04:28 PM)Gribouillis Wrote: You could compute once for all
mtelb, mteub = mte * 30, (mte + 1) * 30 + 5 then use
if mtelb <= int(float(datalist[2])) < mteub: ... The
if float(datalist[6]) % 10 == 0 is questionable because usually one does not compare equality of floating point values with 0. Is it really what you want?
Thanks for the suggestions.
In the last, I'm looking for multiples of 10. Is there a better way to evaluate that?
Posts: 6,780
Threads: 20
Joined: Feb 2020
Quote:My thought on those conversions was to eliminate the decimal portion of some of the output generated. Some fields of the .csv file couldn't be converted by int() directly, though. I therefore used int(float()).
Why do you care about the decimal portion?
mte = 2
lower_bounds = mte * 30
upper_bounds = (mte + 1) * 30 + 5
for d2 in ("59.9", "60.0", "60.1", "94.9", "95", "95.1"):
print(d2, lower_bounds <= float(d2) < upper_bounds, lower_bounds <= int(float(d2)) < upper_bounds) Output: 59.9 False False
60.0 True True
60.1 True True
94.9 True True
95 False False
95.1 False False
The decimal portion has no effect on the result of the comparisons.
Posts: 6,780
Threads: 20
Joined: Feb 2020
Maybe you should use pandas or numpy. This does not look like the optimal way to do anything.
Posts: 279
Threads: 107
Joined: Aug 2019
(Apr-28-2022, 06:42 PM)deanhystad Wrote: Quote:My thought on those conversions was to eliminate the decimal portion of some of the output generated. Some fields of the .csv file couldn't be converted by int() directly, though. I therefore used int(float()).
Why do you care about the decimal portion?
mte = 2
lower_bounds = mte * 30
upper_bounds = (mte + 1) * 30 + 5
for d2 in ("59.9", "60.0", "60.1", "94.9", "95", "95.1"):
print(d2, lower_bounds <= float(d2) < upper_bounds, lower_bounds <= int(float(d2)) < upper_bounds) Output: 59.9 False False
60.0 True True
60.1 True True
94.9 True True
95 False False
95.1 False False
The decimal portion has no effect on the result of the comparisons.
The program generates a results file with different trade statistics and parameters. That is where I got a lot of unnecessary decimal output that I figured I could clean up by converting to int so there would be no decimals.
|