Jun-26-2020, 10:00 AM
(This post was last modified: Jun-26-2020, 10:03 AM by Gribouillis.)
Before going to memory mapping, which may not change much because the problem doesn't seem to be the time spent accessing the files, it seems to me that the mathematical algorithm could be improved by first sorting the arrays of numbers and using 3 pointers:
As
Depending on your data, these algorithmic changes could dramatically cut the numbers of python loops and the number of numerical tests in the execution of the program.
- The first pointer is the current element x[i] in the first array of numbers.
- The second pointer is the first element y[j] in the second array such that y[j] >= x[i] - 4000
- The third pointer is the first element y[k] in the second array such that y[k] > x[i] + 4000
x[i] - y[n]
must be appended to the result for n in range(j, k).As
i
increases, j and k also increase, their new values could be determined by binary search in the second array.Depending on your data, these algorithmic changes could dramatically cut the numbers of python loops and the number of numerical tests in the execution of the program.