Python Forum
how to reduce running time of the code
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
how to reduce running time of the code
#1
I am using following code to compare each entry in one array with other one .if value of arary element falls with in the range of successive entry in the other array. an entry is made in the file with array index, array element and range number.But it is taking too much time for large ( 1Gb) files from where the array is extracted using pandas dataframe.

for i in range (0,len(array)):
	for j in range (0, len(array2)-1):
			
			if (array[i]< array2[j+1]) and (array[i]> array2[j]) :
				print array[i], array2[j]
				filewriter.writerow([i,array[i],j])
				break		
Reply
#2
If the following will not save significant execution time, at least it will be more Pythonic
for next_index, value2 in enumerate(itertools.islice(array2, len(array2)-1), 1):
    for value in array:
        if value2 < value < array2[next_index]:
...........
Iteration over indices is un-Pythonic and inefficient
Output:
In [22]: l Out[22]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] In [23]: %%timeit ...: for v in l: ...: x = v ...: 177 ns ± 1.14 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) In [24]: %%timeit ...: for i in range(len(l)): ...: x = l[i] ...: 763 ns ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
With indices, access is 4 times slower

But the worst bottleneck will be print
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply
#3
Let's try the power of numpy:
import numpy as np

limits = np.array(array2)
for k, a in enumerate(array):
    msk = np.logical_and(limits[:-1] <= a, a <= limits[1:])
    if any(msk):
        p = msk.argmax()
        print(f"{k}: {a[k]} at array2[{p}]")
Although the performance will strongly depend on the size of array2 compared to array1. Here I assume that the sixe of array2 is much smaller, so performing some additional comparisons is better that trying to break at the first match.
If array2 is much bigger then you might prefer to do it in the opposite way (test for each array2[k], array2[k+1] for all the elements in a)
Reply
#4
(May-17-2018, 02:34 PM)killerrex Wrote:
...
for k, a in array:
...
enumerate ?!
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply
#5
Yes, I forgot the enumerate! (doing copy/paste from different versions of the script)
I edit the comment to left the code clear.
Reply
#6
Iterating over the lists and saving the previous value should/may be quicker. An alternative is to break it into pieces, so all values < x in both lists go into 2nd lists
## something like
previous=array[0]  ## so you don't get a false positive on the first pass
for item_1 in array:
    for item_2 in array2:
        if previous < item_1 < item_2 :
        ##if item_1 < item_2 and item_1 > previous:
                print item_1, item_2
        previous=item_2
        ## etc 
Reply
#7
This is the case where using Cython will be very useful.
So, rewrite array comparisons in Cython (Cython supports numpy arrays):

# This code snippet should be saved as a separate *.pyx file, e.g. utils.pyx
# NOT TESTED, tweak data types, read cython docs...
import numpy as np

cpdef do_comparison(np.double[:] array1, np.double[:] array2):
    cdef int i = 0
    cdef int j = 0
    cdef int n = len(array1)
    cdef int m = len(array2)
    result = np.zeros(n, 3, dtype=np.double) 
    for i in range(n):
        for j in range(m - 1):
            if (array1[i] < array2[j + 1]) and (array1[i] > array2[j]) :
                result[i, 0] = i
                result[i, 1] = array1[i]
                result[i, 2] = j
    return result
Now, compile utils.pyx.... (See Cython docs)

# in your main program:

from utils import do_comparison

# some stuff that fills array1 and array2

result = do_comparison(array1, arrya2)

# result is a numpy array; 
# save result to a file if needed
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  reduce time series based on sum condition amdi40 0 1,102 Apr-06-2022, 09:09 AM
Last Post: amdi40
  Reduce four for loops or parallelizing code in Python cee878 1 1,205 Feb-10-2022, 10:02 AM
Last Post: Larz60+
  Newbie at using python and tensorflow getting error when running simple code FeatherineAu 0 3,989 Sep-28-2018, 02:09 PM
Last Post: FeatherineAu
  Code is not running on Windows ertank 7 5,378 Aug-04-2017, 02:07 PM
Last Post: buran
  Reduce code run time shaynehansen 2 3,291 Jul-07-2017, 09:54 PM
Last Post: shaynehansen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020