how to reduce running time of the code

dilmailid · May-17-2018, 09:22 AM

I am using following code to compare each entry in one array with other one .if value of arary element falls with in the range of successive entry in the other array. an entry is made in the file with array index, array element and range number.But it is taking too much time for large ( 1Gb) files from where the array is extracted using pandas dataframe.

for i in range (0,len(array)):
	for j in range (0, len(array2)-1):
			
			if (array[i]< array2[j+1]) and (array[i]> array2[j]) :
				print array[i], array2[j]
				filewriter.writerow([i,array[i],j])
				break

volcano63 · May-17-2018, 01:50 PM

If the following will not save significant execution time, at least it will be more Pythonic

for next_index, value2 in enumerate(itertools.islice(array2, len(array2)-1), 1):
    for value in array:
        if value2 < value < array2[next_index]:
...........

Iteration over indices is un-Pythonic and inefficient

Output:In [22]: l
Out[22]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [23]: %%timeit 
    ...: for v in l:
    ...:     x = v
    ...: 
177 ns ± 1.14 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [24]: %%timeit 
    ...: for i in range(len(l)):
    ...:     x = l[i]
    ...: 
763 ns ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

With indices, access is 4 times slower

But the worst bottleneck will be print

killerrex · (This post was last modified: May-17-2018, 06:00 PM by killerrex.)

Let's try the power of numpy:

import numpy as np

limits = np.array(array2)
for k, a in enumerate(array):
    msk = np.logical_and(limits[:-1] <= a, a <= limits[1:])
    if any(msk):
        p = msk.argmax()
        print(f"{k}: {a[k]} at array2[{p}]")

Although the performance will strongly depend on the size of array2 compared to array1. Here I assume that the sixe of array2 is much smaller, so performing some additional comparisons is better that trying to break at the first match.
If array2 is much bigger then you might prefer to do it in the opposite way (test for each array2[k], array2[k+1] for all the elements in a)

volcano63 · May-17-2018, 02:54 PM

(May-17-2018, 02:34 PM)killerrex Wrote:
...
for k, a in array:
...

enumerate ?!

killerrex · May-17-2018, 06:00 PM

Yes, I forgot the enumerate! (doing copy/paste from different versions of the script)
I edit the comment to left the code clear.

woooee · (This post was last modified: May-17-2018, 06:37 PM by woooee.)

Iterating over the lists and saving the previous value should/may be quicker. An alternative is to break it into pieces, so all values < x in both lists go into 2nd lists

## something like
previous=array[0]  ## so you don't get a false positive on the first pass
for item_1 in array:
    for item_2 in array2:
        if previous < item_1 < item_2 :
        ##if item_1 < item_2 and item_1 > previous:
                print item_1, item_2
        previous=item_2
        ## etc

**scidam** · May-18-2018, 02:49 AM

This is the case where using Cython will be very useful.
So, rewrite array comparisons in Cython (Cython supports numpy arrays):

# This code snippet should be saved as a separate *.pyx file, e.g. utils.pyx
# NOT TESTED, tweak data types, read cython docs...
import numpy as np

cpdef do_comparison(np.double[:] array1, np.double[:] array2):
    cdef int i = 0
    cdef int j = 0
    cdef int n = len(array1)
    cdef int m = len(array2)
    result = np.zeros(n, 3, dtype=np.double) 
    for i in range(n):
        for j in range(m - 1):
            if (array1[i] < array2[j + 1]) and (array1[i] > array2[j]) :
                result[i, 0] = i
                result[i, 1] = array1[i]
                result[i, 2] = j
    return result

Now, compile utils.pyx.... (See Cython docs)

# in your main program:

from utils import do_comparison

# some stuff that fills array1 and array2

result = do_comparison(array1, arrya2)

# result is a numpy array; 
# save result to a file if needed

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	reduce time series based on sum condition	amdi40	0	1,102	Apr-06-2022, 09:09 AM Last Post: amdi40
	Reduce four for loops or parallelizing code in Python	cee878	1	1,205	Feb-10-2022, 10:02 AM Last Post: Larz60+
	Newbie at using python and tensorflow getting error when running simple code	FeatherineAu	0	3,989	Sep-28-2018, 02:09 PM Last Post: FeatherineAu
	Code is not running on Windows	ertank	7	5,378	Aug-04-2017, 02:07 PM Last Post: buran
	Reduce code run time	shaynehansen	2	3,291	Jul-07-2017, 09:54 PM Last Post: shaynehansen

how to reduce running time of the code

User Panel Messages

Announcements