![]() |
go over and search in numpy array faster - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: go over and search in numpy array faster (/thread-37503.html) |
go over and search in numpy array faster - caro - Jun-20-2022 hello, my array looks like : [[10,20,'1'],[10,50,'1'],[10,100,'1'],[10,30,'1'],[20,10,'1'],[10,60,'1'],[30,10,'1'] I would like to know how to return an array(newarr) with the following logic(here is a pseudo code):I need it to be implement as faster as possible! please help be to write it in the correct and faster way in python.(maybe using numpy,i understand that numpy is fast) i=0 for x in arraynumpy i=i+1 for y in arraynumpy[0:i-1] if x[0]==y[1] and x[1]==y[0] and x[2]==y[2] newarr.append(x) continue; # break the loop for y,if foundthe array that will be returned for the input ,will be: [[20,10,'1'],[30,10,'1']] thank you RE: go over and search in numpy array faster - deanhystad - Jun-20-2022 numpy will speed up performing math operations on arrays. This is not math. The best way to speed something up is do less. This is your code: def loop_match(data): """Find reversed matches by looping through data""" result = [] for index, a in enumerate(data): for b in data[index + 1 :]: if a[2] == b[2] and a[1] == b[0] and a[0] == b[1]: result.append(b) break return resultThis has lots of looping and lots of comparisons. This uses set operations to find the intersection of the data and the data with the first two elements swapped. def list_match(data): """Find reversed matches using set intersection""" data = set(data) reversed = set([(x[1], x[0], x[2]) for x in data if x[1] > x[0]]) return data.intersection(reversed)I used timeit.Timer() to measure the runtime for processing 1000 tuples. def list_match(data): """Find reversed matches using set intersection""" data = set(data) reversed = set([(x[1], x[0], x[2]) for x in data if x[1] > x[0]]) return data.intersection(reversed) def loop_match(data): """Find reversed matches by looping through data""" result = [] for index, a in enumerate(data): for b in data[index + 1 :]: if a[2] == b[2] and a[1] == b[0] and a[0] == b[1]: result.append(b) break return result if __name__ == "__main__": from timeit import Timer from random import randint, choice data = [ (randint(1, 100), randint(1, 100), choice(["1", "2", "3"])) for _ in range(1000) ] print(sorted(list(list_match(data)))) print(sorted(loop_match(data))) print( "Reversed match", Timer("list_match(data)", setup="from __main__ import list_match, data").timeit(1000), "msec", ) print( "Looping", Timer("loop_match(data)", setup="from __main__ import loop_match, data").timeit(1000), "msec", ) The set method is almost 170 times faster than the loop method. As a side benefit the set method always returns tuples with the values in sorted order (high, low).
RE: go over and search in numpy array faster - Gribouillis - Jun-20-2022 I suggest this implementation, inspired from more_itertools.unique_everseen() def select(iterable): seenset = set() seenset_add = seenset.add for element in iterable: k = element[1], element[0], element[2] if k in seenset: yield element else: seenset_add(tuple(element)) arr = [[10,20,'1'],[10,50,'1'],[10,100,'1'],[10,30,'1'],[20,10,'1'],[10,60,'1'],[30,10,'1']] new = list(select(arr)) print(arr) print(new)
RE: go over and search in numpy array faster - caro - Jun-20-2022 thank you RE: go over and search in numpy array faster - Gribouillis - Jun-20-2022 Thinking again, I think it should be the following, because if you append for example [10, 30, '1'] to the list, it should also be in the output list. def select(iterable): seenset = set() seenset_add = seenset.add for element in iterable: k = element[1], element[0], element[2] if k in seenset: yield element seenset_add(tuple(element)) RE: go over and search in numpy array faster - deanhystad - Jun-20-2022 Adding Gribouillis' method to my speed test. def list_match(data): """Find reversed matches using set intersection""" data = set(data) reversed = set([(x[1], x[0], x[2]) for x in data if x[1] > x[0]]) return data.intersection(reversed) def loop_match(data): """Find reversed matches by looping through data""" result = [] for index, a in enumerate(data): for b in data[index + 1 :]: if a[2] == b[2] and a[1] == b[0] and a[0] == b[1]: result.append(b) break return result def select(iterable): seenset = set() seenset_add = seenset.add for element in iterable: k = element[1], element[0], element[2] if k in seenset: yield element else: seenset_add(tuple(element)) if __name__ == "__main__": from timeit import Timer from random import randint, choice data = [ (randint(1, 100), randint(1, 100), choice(["1", "2", "3"])) for _ in range(1000) ] print("Sets", sorted(list(list_match(data)))) print("Loop", sorted(loop_match(data))) print("Grib", list(select(data))) print( "Reversed match", Timer("list_match(data)", setup="from __main__ import list_match, data").timeit( 1000 ), "msec", ) print( "Looping", Timer("loop_match(data)", setup="from __main__ import loop_match, data").timeit( 1000 ), "msec", ) Avoid looping.Notice that Gribouillis' algorithm can result in duplicate entries (58, 95, '3' appears twice). RE: go over and search in numpy array faster - Gribouillis - Jun-20-2022 (Jun-20-2022, 03:58 PM)deanhystad Wrote: Notice that Gribouillis' algorithm can result in duplicate entries (58, 95, '3' appears twice).I think the initial logic at the top of this thread can also result in duplicate entries. RE: go over and search in numpy array faster - deanhystad - Jun-20-2022 I'll add the disclaimer. "Notice that Gribouillis' algorithm can result in duplicate entries (58, 95, '3' appears twice). I don't know if that is important." The original logic also allowed for pairs like (30, 10, 1) and (10, 30, 1) which I'm not sure about either, |