Python Forum

I am using following code to compare each entry in one array with other one .if value of arary element falls with in the range of successive entry in the other array. an entry is made in the file with array index, array element and range number.But it is taking too much time for large ( 1Gb) files from where the array is extracted using pandas dataframe.

for i in range (0,len(array)):
	for j in range (0, len(array2)-1):
			
			if (array[i]< array2[j+1]) and (array[i]> array2[j]) :
				print array[i], array2[j]
				filewriter.writerow([i,array[i],j])
				break

If the following will not save significant execution time, at least it will be more Pythonic

for next_index, value2 in enumerate(itertools.islice(array2, len(array2)-1), 1):
    for value in array:
        if value2 < value < array2[next_index]:
...........

Iteration over indices is un-Pythonic and inefficient

Output:In [22]: l
Out[22]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [23]: %%timeit 
    ...: for v in l:
    ...:     x = v
    ...: 
177 ns ± 1.14 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [24]: %%timeit 
    ...: for i in range(len(l)):
    ...:     x = l[i]
    ...: 
763 ns ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

With indices, access is 4 times slower

But the worst bottleneck will be print

Let's try the power of numpy:

import numpy as np

limits = np.array(array2)
for k, a in enumerate(array):
    msk = np.logical_and(limits[:-1] <= a, a <= limits[1:])
    if any(msk):
        p = msk.argmax()
        print(f"{k}: {a[k]} at array2[{p}]")

Although the performance will strongly depend on the size of array2 compared to array1. Here I assume that the sixe of array2 is much smaller, so performing some additional comparisons is better that trying to break at the first match.
If array2 is much bigger then you might prefer to do it in the opposite way (test for each array2[k], array2[k+1] for all the elements in a)

(May-17-2018, 02:34 PM)killerrex Wrote: [ -> ]
...
for k, a in array:
...

enumerate ?!

Yes, I forgot the enumerate! (doing copy/paste from different versions of the script)
I edit the comment to left the code clear.

Iterating over the lists and saving the previous value should/may be quicker. An alternative is to break it into pieces, so all values < x in both lists go into 2nd lists

## something like
previous=array[0]  ## so you don't get a false positive on the first pass
for item_1 in array:
    for item_2 in array2:
        if previous < item_1 < item_2 :
        ##if item_1 < item_2 and item_1 > previous:
                print item_1, item_2
        previous=item_2
        ## etc

This is the case where using Cython will be very useful.
So, rewrite array comparisons in Cython (Cython supports numpy arrays):

# This code snippet should be saved as a separate *.pyx file, e.g. utils.pyx
# NOT TESTED, tweak data types, read cython docs...
import numpy as np

cpdef do_comparison(np.double[:] array1, np.double[:] array2):
    cdef int i = 0
    cdef int j = 0
    cdef int n = len(array1)
    cdef int m = len(array2)
    result = np.zeros(n, 3, dtype=np.double) 
    for i in range(n):
        for j in range(m - 1):
            if (array1[i] < array2[j + 1]) and (array1[i] > array2[j]) :
                result[i, 0] = i
                result[i, 1] = array1[i]
                result[i, 2] = j
    return result

Now, compile utils.pyx.... (See Cython docs)

# in your main program:

from utils import do_comparison

# some stuff that fills array1 and array2

result = do_comparison(array1, arrya2)

# result is a numpy array; 
# save result to a file if needed

dilmailid

volcano63

killerrex

volcano63

killerrex

woooee

scidam