go over and search in numpy array faster

caro · (This post was last modified: Jun-20-2022, 12:11 PM by caro.)

hello,
my array looks like :

[[10,20,'1'],[10,50,'1'],[10,100,'1'],[10,30,'1'],[20,10,'1'],[10,60,'1'],[30,10,'1']

I would like to know how to return an array(newarr) with the following logic(here is a pseudo code):I need it to be implement as faster as possible! please help be to write it in the correct and faster way in python.(maybe using numpy,i understand that numpy is fast)

i=0
for x in arraynumpy
  i=i+1
  for y in arraynumpy[0:i-1]
     if x[0]==y[1] and x[1]==y[0] and x[2]==y[2]
        newarr.append(x)
        continue;  # break the loop for y,if found

the array that will be returned for the input ,will be: [[20,10,'1'],[30,10,'1']]

thank you

**deanhystad** · Jun-20-2022, 03:01 PM

numpy will speed up performing math operations on arrays. This is not math.

The best way to speed something up is do less. This is your code:

def loop_match(data):
    """Find reversed matches by looping through data"""
    result = []
    for index, a in enumerate(data):
        for b in data[index + 1 :]:
            if a[2] == b[2] and a[1] == b[0] and a[0] == b[1]:
                result.append(b)
                break
    return result

This has lots of looping and lots of comparisons.

This uses set operations to find the intersection of the data and the data with the first two elements swapped.

def list_match(data):
    """Find reversed matches using set intersection"""
    data = set(data)
    reversed = set([(x[1], x[0], x[2]) for x in data if x[1] > x[0]])
    return data.intersection(reversed)

I used timeit.Timer() to measure the runtime for processing 1000 tuples.

def list_match(data):
    """Find reversed matches using set intersection"""
    data = set(data)
    reversed = set([(x[1], x[0], x[2]) for x in data if x[1] > x[0]])
    return data.intersection(reversed)


def loop_match(data):
    """Find reversed matches by looping through data"""
    result = []
    for index, a in enumerate(data):
        for b in data[index + 1 :]:
            if a[2] == b[2] and a[1] == b[0] and a[0] == b[1]:
                result.append(b)
                break
    return result


if __name__ == "__main__":
    from timeit import Timer
    from random import randint, choice

    data = [
        (randint(1, 100), randint(1, 100), choice(["1", "2", "3"])) for _ in range(1000)
    ]

    print(sorted(list(list_match(data))))
    print(sorted(loop_match(data)))

    print(
        "Reversed match",
        Timer("list_match(data)", setup="from __main__ import list_match, data").timeit(1000),
        "msec",
    )

    print(
        "Looping",
        Timer("loop_match(data)", setup="from __main__ import loop_match, data").timeit(1000),
        "msec",
    )

Output:[(16, 8, '1'), (28, 24, '2'), (34, 29, '1'), (57, 37, '2'), (58, 29, '3'), (64, 44, '3'), (67, 45, '2'), (78, 59, '3'), (79, 75, '3'), (82, 20, '1'), (82, 41, '3'), (83, 24, '1'), (83, 61, '1'), (85, 55, '3'), (89, 20, '2'), (89, 64, '3'), (90, 72, '1'), (92, 57, '2'), (93, 9, '1'), (93, 74, '2'), (98, 62, '3'), (100, 76, '3')]
[(8, 16, '1'), (9, 93, '1'), (24, 28, '2'), (24, 83, '1'), (29, 34, '1'), (29, 58, '3'), (41, 82, '3'), (44, 64, '3'), (45, 67, '2'), (57, 37, '2'), (61, 83, '1'), (64, 89, '3'), (75, 79, '3'), (76, 100, '3'), (78, 59, '3'), (82, 20, '1'), (85, 55, '3'), (89, 20, '2'), (90, 72, '1'), (92, 57, '2'), (93, 74, '2'), (98, 62, '3')]
Reversed match 0.24131610000000003 msec
Looping 40.840424299999995 msec

The set method is almost 170 times faster than the loop method. As a side benefit the set method always returns tuples with the values in sorted order (high, low).

**Gribouillis** · Jun-20-2022, 03:37 PM

I suggest this implementation, inspired from more_itertools.unique_everseen()

def select(iterable):
    seenset = set()
    seenset_add = seenset.add

    for element in iterable:
        k = element[1], element[0], element[2]
        if k in seenset:
            yield element
        else:
            seenset_add(tuple(element))


arr = [[10,20,'1'],[10,50,'1'],[10,100,'1'],[10,30,'1'],[20,10,'1'],[10,60,'1'],[30,10,'1']]

new = list(select(arr))

print(arr)
print(new)

Output:[[10, 20, '1'], [10, 50, '1'], [10, 100, '1'], [10, 30, '1'], [20, 10, '1'], [10, 60, '1'], [30, 10, '1']]
[[20, 10, '1'], [30, 10, '1']]

caro · Jun-20-2022, 03:39 PM

thank you

**Gribouillis** · Jun-20-2022, 03:50 PM

Thinking again, I think it should be the following, because if you append for example [10, 30, '1'] to the list, it should also be in the output list.

def select(iterable):
    seenset = set()
    seenset_add = seenset.add

    for element in iterable:
        k = element[1], element[0], element[2]
        if k in seenset:
            yield element
        seenset_add(tuple(element))

**deanhystad** · (This post was last modified: Jun-20-2022, 04:00 PM by deanhystad.)

Adding Gribouillis' method to my speed test.

def list_match(data):
    """Find reversed matches using set intersection"""
    data = set(data)
    reversed = set([(x[1], x[0], x[2]) for x in data if x[1] > x[0]])
    return data.intersection(reversed)


def loop_match(data):
    """Find reversed matches by looping through data"""
    result = []
    for index, a in enumerate(data):
        for b in data[index + 1 :]:
            if a[2] == b[2] and a[1] == b[0] and a[0] == b[1]:
                result.append(b)
                break
    return result


def select(iterable):
    seenset = set()
    seenset_add = seenset.add

    for element in iterable:
        k = element[1], element[0], element[2]
        if k in seenset:
            yield element
        else:
            seenset_add(tuple(element))


if __name__ == "__main__":
    from timeit import Timer
    from random import randint, choice

    data = [
        (randint(1, 100), randint(1, 100), choice(["1", "2", "3"])) for _ in range(1000)
    ]

    print("Sets", sorted(list(list_match(data))))
    print("Loop", sorted(loop_match(data)))
    print("Grib", list(select(data)))

    print(
        "Reversed match",
        Timer("list_match(data)", setup="from __main__ import list_match, data").timeit(
            1000
        ),
        "msec",
    )

    print(
        "Looping",
        Timer("loop_match(data)", setup="from __main__ import loop_match, data").timeit(
            1000
        ),
        "msec",
    )

Output:Sets [(24, 14, '2'), (25, 1, '1'), (27, 8, '1'), (38, 26, '1'), (46, 30, '3'), (53, 2, '2'), (54, 10, '1'), (69, 67, '1'), (73, 9, '2'), (79, 22, '1'), (81, 49, '1'), (82, 79, '2'), (87, 69, '2'), (93, 38, '2'), (93, 61, '1'), (95, 58, '3'), (96, 94, '2'), (97, 71, '2'), (99, 19, '2'), (100, 79, '3')]
Loop [(2, 53, '2'), (8, 27, '1'), (19, 99, '2'), (24, 14, '2'), (25, 1, '1'), (38, 26, '1'), (38, 93, '2'), (46, 30, '3'), (49, 81, '1'), (54, 10, '1'), (58, 95, '3'), (61, 93, '1'), (67, 69, '1'), (69, 87, '2'), (73, 9, '2'), (79, 22, '1'), (79, 82, '2'), (79, 100, '3'), (96, 94, '2'), (97, 71, '2')]
Grib [(67, 69, '1'), (2, 53, '2'), (38, 26, '1'), (79, 100, '3'), (24, 14, '2'), (79, 82, '2'), (19, 99, '2'), (69, 87, '2'), (54, 10, '1'), (61, 93, '1'), (38, 93, '2'), (58, 95, '3'), (49, 81, '1'), (79, 22, '1'), (58, 95, '3'), (25, 1, '1'), (96, 94, '2'), (97, 71, '2'), (8, 27, '1'), (73, 9, '2'), (46, 30, '3')]
Reversed match 0.2510886 msec
Looping 41.859004899999995 msec
Gribouillis 0.3636735999999985 msec

Avoid looping.

Notice that Gribouillis' algorithm can result in duplicate entries (58, 95, '3' appears twice).

**Gribouillis** · Jun-20-2022, 04:00 PM

(Jun-20-2022, 03:58 PM)deanhystad Wrote: Notice that Gribouillis' algorithm can result in duplicate entries (58, 95, '3' appears twice).

I think the initial logic at the top of this thread can also result in duplicate entries.

**deanhystad** · Jun-20-2022, 04:54 PM

I'll add the disclaimer. "Notice that Gribouillis' algorithm can result in duplicate entries (58, 95, '3' appears twice). I don't know if that is important."

The original logic also allowed for pairs like (30, 10, 1) and (10, 30, 1) which I'm not sure about either,

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Numpy, array(2d,bool), flipping regions.	MvGulik	2	1,155	Oct-27-2024, 11:06 AM Last Post: MvGulik
	python code to calculate mean of an array of numbers using numpy	viren	3	1,384	May-29-2024, 04:49 PM Last Post: Gribouillis
	Convert numpy array to image without loading it into RAM.	DreamingInsanity	7	9,901	Feb-08-2024, 09:38 AM Last Post: paul18fr
	IPython errors for numpy array min/max methods	muelaner	1	1,657	Nov-04-2023, 09:22 PM Last Post: snippsat
	Expand the range of a NumPy array?	PythonNPC	0	1,908	Jan-31-2023, 02:41 AM Last Post: PythonNPC
	Change a numpy array to a dataframe	Led_Zeppelin	3	2,983	Jan-26-2023, 09:01 PM Last Post: deanhystad
	from numpy array to csv - rounding	SchroedingersLion	6	6,920	Nov-14-2022, 09:09 PM Last Post: deanhystad
	numpy.array has no attribute head	Led_Zeppelin	1	3,526	Jul-13-2022, 12:56 AM Last Post: Led_Zeppelin
	Seeing al the data in a dataframe or numpy.array	Led_Zeppelin	1	1,838	Jul-11-2022, 08:54 PM Last Post: Larz60+
	Creating a numpy array from specific values of a spreadsheet column	JulianZ	0	1,816	Apr-19-2022, 07:36 AM Last Post: JulianZ

go over and search in numpy array faster

User Panel Messages

Announcements