Posts: 5
Threads: 3
Joined: Jun 2022
Jun-20-2022, 11:23 AM
(This post was last modified: Jun-20-2022, 12:11 PM by caro.)
hello,
my array looks like :
[[10,20,'1'],[10,50,'1'],[10,100,'1'],[10,30,'1'],[20,10,'1'],[10,60,'1'],[30,10,'1']
I would like to know how to return an array(newarr) with the following logic(here is a pseudo code):I need it to be implement as faster as possible! please help be to write it in the correct and faster way in python.(maybe using numpy,i understand that numpy is fast)
1 2 3 4 5 6 7 |
i = 0
for x in arraynumpy
i = i + 1
for y in arraynumpy[ 0 :i - 1 ]
if x[ 0 ] = = y[ 1 ] and x[ 1 ] = = y[ 0 ] and x[ 2 ] = = y[ 2 ]
newarr.append(x)
continue ;
|
the array that will be returned for the input ,will be: [[20,10,'1'],[30,10,'1']]
thank you
Attached Files
Thumbnail(s)
Posts: 6,815
Threads: 20
Joined: Feb 2020
numpy will speed up performing math operations on arrays. This is not math.
The best way to speed something up is do less. This is your code:
1 2 3 4 5 6 7 8 9 |
def loop_match(data):
result = []
for index, a in enumerate (data):
for b in data[index + 1 :]:
if a[ 2 ] = = b[ 2 ] and a[ 1 ] = = b[ 0 ] and a[ 0 ] = = b[ 1 ]:
result.append(b)
break
return result
|
This has lots of looping and lots of comparisons.
This uses set operations to find the intersection of the data and the data with the first two elements swapped.
1 2 3 4 5 |
def list_match(data):
data = set (data)
reversed = set ([(x[ 1 ], x[ 0 ], x[ 2 ]) for x in data if x[ 1 ] > x[ 0 ]])
return data.intersection( reversed )
|
I used timeit.Timer() to measure the runtime for processing 1000 tuples.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
def list_match(data):
data = set (data)
reversed = set ([(x[ 1 ], x[ 0 ], x[ 2 ]) for x in data if x[ 1 ] > x[ 0 ]])
return data.intersection( reversed )
def loop_match(data):
result = []
for index, a in enumerate (data):
for b in data[index + 1 :]:
if a[ 2 ] = = b[ 2 ] and a[ 1 ] = = b[ 0 ] and a[ 0 ] = = b[ 1 ]:
result.append(b)
break
return result
if __name__ = = "__main__" :
from timeit import Timer
from random import randint, choice
data = [
(randint( 1 , 100 ), randint( 1 , 100 ), choice([ "1" , "2" , "3" ])) for _ in range ( 1000 )
]
print ( sorted ( list (list_match(data))))
print ( sorted (loop_match(data)))
print (
"Reversed match" ,
Timer( "list_match(data)" , setup = "from __main__ import list_match, data" ).timeit( 1000 ),
"msec" ,
)
print (
"Looping" ,
Timer( "loop_match(data)" , setup = "from __main__ import loop_match, data" ).timeit( 1000 ),
"msec" ,
)
|
Output: [(16, 8, '1'), (28, 24, '2'), (34, 29, '1'), (57, 37, '2'), (58, 29, '3'), (64, 44, '3'), (67, 45, '2'), (78, 59, '3'), (79, 75, '3'), (82, 20, '1'), (82, 41, '3'), (83, 24, '1'), (83, 61, '1'), (85, 55, '3'), (89, 20, '2'), (89, 64, '3'), (90, 72, '1'), (92, 57, '2'), (93, 9, '1'), (93, 74, '2'), (98, 62, '3'), (100, 76, '3')]
[(8, 16, '1'), (9, 93, '1'), (24, 28, '2'), (24, 83, '1'), (29, 34, '1'), (29, 58, '3'), (41, 82, '3'), (44, 64, '3'), (45, 67, '2'), (57, 37, '2'), (61, 83, '1'), (64, 89, '3'), (75, 79, '3'), (76, 100, '3'), (78, 59, '3'), (82, 20, '1'), (85, 55, '3'), (89, 20, '2'), (90, 72, '1'), (92, 57, '2'), (93, 74, '2'), (98, 62, '3')]
Reversed match 0.24131610000000003 msec
Looping 40.840424299999995 msec
The set method is almost 170 times faster than the loop method. As a side benefit the set method always returns tuples with the values in sorted order (high, low).
Posts: 4,802
Threads: 77
Joined: Jan 2018
I suggest this implementation, inspired from more_itertools.unique_everseen()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
def select(iterable):
seenset = set ()
seenset_add = seenset.add
for element in iterable:
k = element[ 1 ], element[ 0 ], element[ 2 ]
if k in seenset:
yield element
else :
seenset_add( tuple (element))
arr = [[ 10 , 20 , '1' ],[ 10 , 50 , '1' ],[ 10 , 100 , '1' ],[ 10 , 30 , '1' ],[ 20 , 10 , '1' ],[ 10 , 60 , '1' ],[ 30 , 10 , '1' ]]
new = list (select(arr))
print (arr)
print (new)
|
Output: [[10, 20, '1'], [10, 50, '1'], [10, 100, '1'], [10, 30, '1'], [20, 10, '1'], [10, 60, '1'], [30, 10, '1']]
[[20, 10, '1'], [30, 10, '1']]
Posts: 5
Threads: 3
Joined: Jun 2022
Posts: 4,802
Threads: 77
Joined: Jan 2018
Thinking again, I think it should be the following, because if you append for example [10, 30, '1'] to the list, it should also be in the output list.
1 2 3 4 5 6 7 8 9 |
def select(iterable):
seenset = set ()
seenset_add = seenset.add
for element in iterable:
k = element[ 1 ], element[ 0 ], element[ 2 ]
if k in seenset:
yield element
seenset_add( tuple (element))
|
Posts: 6,815
Threads: 20
Joined: Feb 2020
Jun-20-2022, 04:00 PM
(This post was last modified: Jun-20-2022, 04:00 PM by deanhystad.)
Adding Gribouillis' method to my speed test.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
def list_match(data):
data = set (data)
reversed = set ([(x[ 1 ], x[ 0 ], x[ 2 ]) for x in data if x[ 1 ] > x[ 0 ]])
return data.intersection( reversed )
def loop_match(data):
result = []
for index, a in enumerate (data):
for b in data[index + 1 :]:
if a[ 2 ] = = b[ 2 ] and a[ 1 ] = = b[ 0 ] and a[ 0 ] = = b[ 1 ]:
result.append(b)
break
return result
def select(iterable):
seenset = set ()
seenset_add = seenset.add
for element in iterable:
k = element[ 1 ], element[ 0 ], element[ 2 ]
if k in seenset:
yield element
else :
seenset_add( tuple (element))
if __name__ = = "__main__" :
from timeit import Timer
from random import randint, choice
data = [
(randint( 1 , 100 ), randint( 1 , 100 ), choice([ "1" , "2" , "3" ])) for _ in range ( 1000 )
]
print ( "Sets" , sorted ( list (list_match(data))))
print ( "Loop" , sorted (loop_match(data)))
print ( "Grib" , list (select(data)))
print (
"Reversed match" ,
Timer( "list_match(data)" , setup = "from __main__ import list_match, data" ).timeit(
1000
),
"msec" ,
)
print (
"Looping" ,
Timer( "loop_match(data)" , setup = "from __main__ import loop_match, data" ).timeit(
1000
),
"msec" ,
)
|
Output: Sets [(24, 14, '2'), (25, 1, '1'), (27, 8, '1'), (38, 26, '1'), (46, 30, '3'), (53, 2, '2'), (54, 10, '1'), (69, 67, '1'), (73, 9, '2'), (79, 22, '1'), (81, 49, '1'), (82, 79, '2'), (87, 69, '2'), (93, 38, '2'), (93, 61, '1'), (95, 58, '3'), (96, 94, '2'), (97, 71, '2'), (99, 19, '2'), (100, 79, '3')]
Loop [(2, 53, '2'), (8, 27, '1'), (19, 99, '2'), (24, 14, '2'), (25, 1, '1'), (38, 26, '1'), (38, 93, '2'), (46, 30, '3'), (49, 81, '1'), (54, 10, '1'), (58, 95, '3'), (61, 93, '1'), (67, 69, '1'), (69, 87, '2'), (73, 9, '2'), (79, 22, '1'), (79, 82, '2'), (79, 100, '3'), (96, 94, '2'), (97, 71, '2')]
Grib [(67, 69, '1'), (2, 53, '2'), (38, 26, '1'), (79, 100, '3'), (24, 14, '2'), (79, 82, '2'), (19, 99, '2'), (69, 87, '2'), (54, 10, '1'), (61, 93, '1'), (38, 93, '2'), (58, 95, '3'), (49, 81, '1'), (79, 22, '1'), (58, 95, '3'), (25, 1, '1'), (96, 94, '2'), (97, 71, '2'), (8, 27, '1'), (73, 9, '2'), (46, 30, '3')]
Reversed match 0.2510886 msec
Looping 41.859004899999995 msec
Gribouillis 0.3636735999999985 msec
Avoid looping.
Notice that Gribouillis' algorithm can result in duplicate entries (58, 95, '3' appears twice).
Posts: 4,802
Threads: 77
Joined: Jan 2018
(Jun-20-2022, 03:58 PM)deanhystad Wrote: Notice that Gribouillis' algorithm can result in duplicate entries (58, 95, '3' appears twice). I think the initial logic at the top of this thread can also result in duplicate entries.
Posts: 6,815
Threads: 20
Joined: Feb 2020
I'll add the disclaimer. "Notice that Gribouillis' algorithm can result in duplicate entries (58, 95, '3' appears twice). I don't know if that is important."
The original logic also allowed for pairs like (30, 10, 1) and (10, 30, 1) which I'm not sure about either,
|