In my code:
Some remarks:
hope it helps
I noticed some mistake in my prevous code; please consider the following one that is looking for a pattern of 1000 components in a vector of 3 million lines (it took 62 sec on my old laptop)
- vect4 is the pattern
- in the loop, I'm extracting parts of vect3 having the size of the pattern, and then I'm comparing it to vect4 (True or False)
- at the same time the "check" vector (composed of booleans) is updated
- At the end, I'm looking for "True" values in order to get both the position and the occurence(s) of the pattern
Some remarks:
- I've not been able to avoid the loop
- it's true in Python but not only: avoid (memory) dynamic allocation to speed up your code
- I guess we can adapt it for
hope it helps
I noticed some mistake in my prevous code; please consider the following one that is looking for a pattern of 1000 components in a vector of 3 million lines (it took 62 sec on my old laptop)
import numpy as np import time, os, re n = 1000; vect3 = np.random.randint(2,size = 3000*n); vect4 = np.random.randint(2,size = n); ## 4 occurences are manually created vect3[1:1+n] = vect4; vect3[6596:6596+n] = vect4; vect3[10023:10023+n] = vect4; vect3[569872:569872+n] = vect4; ## initialization / No match by default check = np.full(3000*n, False, dtype = bool); ## partially vectorized t0 = time.time() for i in range(3000*n): check[i] = np.array_equiv(vect3[i:n+i],vect4); sol = np.where(check == True); ## we're lokking for True occurences and its positions (see Tuple) if (sol == []): print("No match") else: l = np.size(sol); print("There are %d matche(s)" %l) print("Positions: %s" %sol) t1 = time.time() print("Duration method: ", t1-t0)