I am using following code to compare each entry in one array with other one .if value of arary element falls with in the range of successive entry in the other array. an entry is made in the file with array index, array element and range number.But it is taking too much time for large ( 1Gb) files from where the array is extracted using pandas dataframe.
for i in range (0,len(array)):
for j in range (0, len(array2)-1):
if (array[i]< array2[j+1]) and (array[i]> array2[j]) :
print array[i], array2[j]
filewriter.writerow([i,array[i],j])
break
If the following will not save significant execution time, at least it will be more Pythonic
for next_index, value2 in enumerate(itertools.islice(array2, len(array2)-1), 1):
for value in array:
if value2 < value < array2[next_index]:
...........
Iteration over indices is un-Pythonic and inefficient
Output:
In [22]: l
Out[22]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [23]: %%timeit
...: for v in l:
...: x = v
...:
177 ns ± 1.14 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [24]: %%timeit
...: for i in range(len(l)):
...: x = l[i]
...:
763 ns ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
With indices,
access is 4 times slower
But the worst bottleneck will be
print
Let's try the power of numpy:
import numpy as np
limits = np.array(array2)
for k, a in enumerate(array):
msk = np.logical_and(limits[:-1] <= a, a <= limits[1:])
if any(msk):
p = msk.argmax()
print(f"{k}: {a[k]} at array2[{p}]")
Although the performance will strongly depend on the size of array2 compared to array1. Here I assume that the sixe of array2 is much smaller, so performing some additional comparisons is better that trying to break at the first match.
If array2 is much bigger then you might prefer to do it in the opposite way (test for each array2[k], array2[k+1] for all the elements in a)
Yes, I forgot the enumerate! (doing copy/paste from different versions of the script)
I edit the comment to left the code clear.
Iterating over the lists and saving the previous value should/may be quicker. An alternative is to break it into pieces, so all values < x in both lists go into 2nd lists
## something like
previous=array[0] ## so you don't get a false positive on the first pass
for item_1 in array:
for item_2 in array2:
if previous < item_1 < item_2 :
##if item_1 < item_2 and item_1 > previous:
print item_1, item_2
previous=item_2
## etc
This is the case where using
Cython will be very useful.
So, rewrite array comparisons in Cython (Cython supports numpy arrays):
# This code snippet should be saved as a separate *.pyx file, e.g. utils.pyx
# NOT TESTED, tweak data types, read cython docs...
import numpy as np
cpdef do_comparison(np.double[:] array1, np.double[:] array2):
cdef int i = 0
cdef int j = 0
cdef int n = len(array1)
cdef int m = len(array2)
result = np.zeros(n, 3, dtype=np.double)
for i in range(n):
for j in range(m - 1):
if (array1[i] < array2[j + 1]) and (array1[i] > array2[j]) :
result[i, 0] = i
result[i, 1] = array1[i]
result[i, 2] = j
return result
Now, compile
utils.pyx
.... (See
Cython docs)
# in your main program:
from utils import do_comparison
# some stuff that fills array1 and array2
result = do_comparison(array1, arrya2)
# result is a numpy array;
# save result to a file if needed