Posts: 300
Threads: 72
Joined: Apr 2019
Jul-16-2024, 08:56 AM
(This post was last modified: Jul-16-2024, 09:35 AM by paul18fr.)
Hi,
As one can see in the picture, i'm trying to retrieve sets of data (V array : 1 row = 1 set) into M array.
Note data "sens" is important i.e values in yellow are expected ones, not ones in green.
I cannot use np.intersect1d since it looks for single values at a time, not for a set.
Of course the current example has been simplified, and in a real world i'm dealing with millions of rows for M / thousands for V: performance is a keypoint
Well any hint is welcomed
Thanks for your time
Paul
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import numpy as np
M = np.array([[ 3301 , 947 , 898 , 899 , 945 , 3467 , 3468 , 3469 , 3470 ],
[ 3302 , 3467 , 3468 , 3469 , 3470 , 3471 , 3472 , 3473 , 3474 ],
[ 3303 , 3471 , 3472 , 3473 , 3474 , 3475 , 3476 , 3477 , 3478 ],
[ 3304 , 3475 , 3476 , 3477 , 3478 , 3479 , 3480 , 3481 , 3482 ],
[ 3305 , 3479 , 3480 , 3481 , 3482 , 3483 , 3484 , 3485 , 3486 ],
[ 4301 , 947 , 898 , 899 , 945 , 3467 , 3468 , 3470 , 3469 ],
[ 4304 , 3475 , 3476 , 3477 , 3478 , 3480 , 3479 , 3481 , 3482 ]])
V = np.array([[ 3469 , 3470 ],
[ 3471 , 3472 ],
[ 3479 , 3480 ]])
val, M_ind, V_ind = np.intersect1d(M[:, 1 ::], V, assume_unique = False , return_indices = True )
|
Attached Files
Thumbnail(s)
Posts: 300
Threads: 72
Joined: Apr 2019
Jul-16-2024, 02:01 PM
(This post was last modified: Jul-16-2024, 02:01 PM by paul18fr.)
The only way i've found but it remains too slow => still looking to numpy
Using less type conversion, solution2 is a bit faster than solution 1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
import numpy as np
import time
M = np.array([[ 3301 , 947 , 898 , 899 , 945 , 3467 , 3468 , 3469 , 3470 ],
[ 3302 , 3467 , 3468 , 3469 , 3470 , 3471 , 3472 , 3473 , 3474 ],
[ 3303 , 3471 , 3472 , 3473 , 3474 , 3475 , 3476 , 3477 , 3478 ],
[ 3304 , 3475 , 3476 , 3477 , 3478 , 3479 , 3480 , 3481 , 3482 ],
[ 3305 , 3479 , 3480 , 3481 , 3482 , 3483 , 3484 , 3485 , 3486 ],
[ 4301 , 947 , 898 , 899 , 945 , 3467 , 3468 , 3470 , 3469 ],
[ 4304 , 3475 , 3476 , 3477 , 3478 , 3480 , 3479 , 3481 , 3482 ]])
n1 = 1
M = np.repeat(M, n1 , axis = 0 )
V = np.array([[ 3469 , 3470 ],
[ 3471 , 3472 ],
[ 3479 , 3480 ]])
n2 = 1
V = np.repeat(V, n2, axis = 0 )
rM, cM = np.shape(M)
rV, cV = np.shape(V)
print ( f "Number of iteration = {rM * rV}" )
def Intersect2D_1(Array_A, Array_B):
def MatchWithTuples(A,B):
return A = = tuple ([b for b in B if b in A])
ResultsList = [ True if MatchWithTuples(A = y,
B = x) else False for x in Array_A for y in Array_B]
return ResultsList
t0 = time.time()
M1 = tuple ( map ( tuple , M[:, 1 ::]))
V1 = tuple ( map ( tuple , V))
Result1 = np.asarray(Intersect2D_1(Array_A = M1, Array_B = V1))
Result1_reshaped = Result1.reshape(rM, rV)
Ind1 = np.unique(np.where(Result1_reshaped = = True )[ 0 ])
t1 = time.time()
print ( f "Solution #1: Duration for M[{rM}, {cM}] and V[{rV}, {cV}] = {t1 - t0}" )
del M1, V1, Result1
def Intersect2D_2(Array_A, Array_B):
def MatchWithLists(A,B):
return A = = [b for b in B if b in A]
ResultsList = [ True if MatchWithLists(A = y,
B = x) else False for x in Array_A for y in Array_B]
return ResultsList
t2 = time.time()
M2 = M[:, 1 ::].tolist()
V2 = V.tolist()
Result2 = np.asarray(Intersect2D_2(Array_A = M2, Array_B = V2))
Result2_reshaped = Result2.reshape(rM, rV)
Ind2 = np.unique(np.where(Result2_reshaped = = True )[ 0 ])
t3 = time.time()
print ( f "Solution #2: Duration for M[{rM}, {cM}] and V[{rV}, {cV}] = {t3 - t2}" )
del M2, V2, Result2
InvolvedElements = M[Ind2, 0 ]
print ( f "\nInvolved row indexes = {Ind2}" )
print ( f "Involved elements = {InvolvedElements}" )
|
Output: Number of iteration = 210000000
Solution #1: Duration for M[700000, 9] and V[300, 2] = 167.9941966533661
Solution #2: Duration for M[700000, 9] and V[300, 2] = 157.40234184265137
Involved row indexes = [ 0 1 2 ... 499997 499998 499999]
Posts: 300
Threads: 72
Joined: Apr 2019
ahhhh i forgot a key point when playing with lists and tuple (see MatchWithLists & MatchWithTuples functions) : [1, 2, 3] and [1, 3] provide the same resut if i'm looking for [1, 3] exact set => it's wrong!
Posts: 6,812
Threads: 20
Joined: Feb 2020
I have no clue what criteria is used to paint a cell yellow or green. Can you explain?
Posts: 300
Threads: 72
Joined: Apr 2019
Jul-17-2024, 02:05 PM
(This post was last modified: Jul-17-2024, 02:06 PM by paul18fr.)
Hi
I'm looking for set of values in V that match exactly in M without any cell inbetween == order in V must be respected: - correct order = yellow cells
- opposite order = wrong = green cells
In the first picture, yellow cells were missing. In the new picture, all cells in color except green (and white) are the target
Expected output:
Output: Number of iteration = 21
Solution #2: Duration for M[7, 9] and V[3, 2] = 0.0009970664978027344
Involved row indexes = [0 1 2 3 4]
Involved elements = [3301 3302 3303 3304 3305]
Attached Files
Thumbnail(s)
Posts: 300
Threads: 72
Joined: Apr 2019
Jul-17-2024, 02:20 PM
(This post was last modified: Jul-17-2024, 03:33 PM by paul18fr.)
If i manually invert 2 cells in M[3, :], then index 3 becomes non-valid (see new picture). I've found cases which invalidate this hypothesis => V[:, 0] must be the first found one!
I'm dealing with the column position as well (diff = 1), but the code becomes hugly and even more slower.
I feel there's a better way to proceed.
1 2 3 4 5 6 7 |
M = np.array([[ 3301 , 947 , 898 , 899 , 945 , 3467 , 3468 , 3469 , 3470 ],
[ 3302 , 3467 , 3468 , 3469 , 3470 , 3471 , 3472 , 3473 , 3474 ],
[ 3303 , 3471 , 3472 , 3473 , 3474 , 3475 , 3476 , 3477 , 3478 ],
[ 3304 , 3475 , 3476 , 3477 , 3478 , 3479 , 3481 , 3480 , 3482 ],
[ 3305 , 3479 , 3480 , 3481 , 3482 , 3483 , 3484 , 3485 , 3486 ],
[ 4301 , 947 , 898 , 899 , 945 , 3467 , 3468 , 3470 , 3469 ],
[ 4304 , 3475 , 3476 , 3477 , 3478 , 3480 , 3479 , 3481 , 3482 ]])
|
Attached Files
Thumbnail(s)
Posts: 1,094
Threads: 143
Joined: Jul 2017
Jul-18-2024, 07:47 AM
(This post was last modified: Jul-18-2024, 07:47 AM by Pedroski55.)
I am not familiar with numpy. But you can do what you want to do like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
import numpy as np
M = np.array([[ 3301 , 947 , 898 , 899 , 945 , 3467 , 3468 , 3469 , 3470 ],
[ 3302 , 3467 , 3468 , 3469 , 3470 , 3471 , 3472 , 3473 , 3474 ],
[ 3303 , 3471 , 3472 , 3473 , 3474 , 3475 , 3476 , 3477 , 3478 ],
[ 3304 , 3475 , 3476 , 3477 , 3478 , 3479 , 3480 , 3481 , 3482 ],
[ 3305 , 3479 , 3480 , 3481 , 3482 , 3483 , 3484 , 3485 , 3486 ],
[ 4301 , 947 , 898 , 899 , 945 , 3467 , 3468 , 3470 , 3469 ],
[ 4304 , 3475 , 3476 , 3477 , 3478 , 3480 , 3479 , 3481 , 3482 ]])
V = np.array([[ 3469 , 3470 ],
[ 3471 , 3472 ],
[ 3479 , 3480 ]])
def checkrow(Mrow, rowV, rownum):
print ( f 'This is row {rownum}' )
print ( f 'checking for sequence {rowV} in {Mrow}' )
if Mrow.count(rowV[ 0 ]) = = 0 :
return False
elif Mrow[Mrow.index(rowV[ 0 ])] = = Mrow[ - 1 ]:
return False
else :
index = Mrow.index(rowV[ 0 ])
if Mrow[index + 1 ] = = rowV[ 1 ]:
print ( 'Found a match!' )
print ( f 'start index = {index}, values are {Mrow[index], Mrow[index + 1]}' )
count = 0
for m in M:
rowM = list (m)
for v in V:
rowV = list (v)
res = checkrow(rowM, rowV, count)
count + = 1
|
Gives:
Output: This is row 0
checking for sequence [3469, 3470] in [3301, 947, 898, 899, 945, 3467, 3468, 3469, 3470]
Found a match!
start index = 7, values are (3469, 3470)
This is row 0
checking for sequence [3471, 3472] in [3301, 947, 898, 899, 945, 3467, 3468, 3469, 3470]
This is row 0
checking for sequence [3479, 3480] in [3301, 947, 898, 899, 945, 3467, 3468, 3469, 3470]
This is row 1
checking for sequence [3469, 3470] in [3302, 3467, 3468, 3469, 3470, 3471, 3472, 3473, 3474]
Found a match!
start index = 3, values are (3469, 3470)
This is row 1
checking for sequence [3471, 3472] in [3302, 3467, 3468, 3469, 3470, 3471, 3472, 3473, 3474]
Found a match!
start index = 5, values are (3471, 3472)
This is row 1
checking for sequence [3479, 3480] in [3302, 3467, 3468, 3469, 3470, 3471, 3472, 3473, 3474]
This is row 2
checking for sequence [3469, 3470] in [3303, 3471, 3472, 3473, 3474, 3475, 3476, 3477, 3478]
This is row 2
checking for sequence [3471, 3472] in [3303, 3471, 3472, 3473, 3474, 3475, 3476, 3477, 3478]
Found a match!
start index = 1, values are (3471, 3472)
This is row 2
checking for sequence [3479, 3480] in [3303, 3471, 3472, 3473, 3474, 3475, 3476, 3477, 3478]
This is row 3
checking for sequence [3469, 3470] in [3304, 3475, 3476, 3477, 3478, 3479, 3480, 3481, 3482]
This is row 3
checking for sequence [3471, 3472] in [3304, 3475, 3476, 3477, 3478, 3479, 3480, 3481, 3482]
This is row 3
checking for sequence [3479, 3480] in [3304, 3475, 3476, 3477, 3478, 3479, 3480, 3481, 3482]
Found a match!
start index = 5, values are (3479, 3480)
This is row 4
checking for sequence [3469, 3470] in [3305, 3479, 3480, 3481, 3482, 3483, 3484, 3485, 3486]
This is row 4
checking for sequence [3471, 3472] in [3305, 3479, 3480, 3481, 3482, 3483, 3484, 3485, 3486]
This is row 4
checking for sequence [3479, 3480] in [3305, 3479, 3480, 3481, 3482, 3483, 3484, 3485, 3486]
Found a match!
start index = 1, values are (3479, 3480)
This is row 5
checking for sequence [3469, 3470] in [4301, 947, 898, 899, 945, 3467, 3468, 3470, 3469]
This is row 5
checking for sequence [3471, 3472] in [4301, 947, 898, 899, 945, 3467, 3468, 3470, 3469]
This is row 5
checking for sequence [3479, 3480] in [4301, 947, 898, 899, 945, 3467, 3468, 3470, 3469]
This is row 6
checking for sequence [3469, 3470] in [4304, 3475, 3476, 3477, 3478, 3480, 3479, 3481, 3482]
This is row 6
checking for sequence [3471, 3472] in [4304, 3475, 3476, 3477, 3478, 3480, 3479, 3481, 3482]
This is row 6
checking for sequence [3479, 3480] in [4304, 3475, 3476, 3477, 3478, 3480, 3479, 3481, 3482]
Posts: 1
Threads: 0
Joined: Jul 2024
In set theory, intersection removes duplicates, which is what this operator does.
Make it simpler.
1 2 |
mask = np.in1d(M, V)
print (mask)
|
Output: [False False False False False False False True True False False False
True True True True False False False True True False False False
False False False False False False False False True True False False
False True True False False False False False False False False False
False False False False True True False False False False False True
True False False]
|