Python Forum
Longest sequence of repeating integers in a numpy array - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Longest sequence of repeating integers in a numpy array (/thread-27433.html)



Longest sequence of repeating integers in a numpy array - Cricri - Jun-06-2020

Hello,

Python (coding in general) newbie, so please forgive me if the question is naive/ I did do a search before asking, but I have not found an answer that fits.

I am trying to show than in a random series of integers, it is perfectly possible to find a sequence of repeating integers that does not look random. To do so, I create a numpy array (in a Jupyter notebook)which I populate with a random.randint to simulate dice throws:

[python] seq = np.random.randint(1, 7, size = 100) [python]

The size is set to 100 arbitrarily. I may want to increase or decrease the size of the array.

Where I get stuck is writing the loop to check the longest running sequence of repeating integers. I am also at a loss for deciding how to deal with cases when there are several sequences of repeating integers of the same length that are the longest sequences. I would like to identify the them all in that case. Either way, I would like to determine the length of the longest sequence(s) and the integer(s) concerned.
For example, here is a run generated with the randit():

-----Generated Random Array----
[2 4 3 3 1 3 1 4 4 2 1 6 5 5 5 1 4 6 3 6 1 5 2 6 3 1 1 5 4 1 3 5 1 4 2 2 6
2 3 1 5 3 1 6 4 5 4 6 5 6 5 6 6 5 5 1 4 2 3 3 5 2 5 1 3 4 3 4 6 6 5 6 1 2
3 2 2 3 2 3 1 5 6 3 3 3 5 3 1 5 6 3 2 2 1 1 4 1 4 1]

If I am not mistaken, the longest running sequences of repeating integers are the 555 and 333 I have highlighted. How do I pick them both out programmatically and show both the length of the sequence and the associated integer?

Thank you for your suggestions and your patience.


RE: Longest sequence of repeating integers in a numpy array - scidam - Jun-07-2020

You can use np.diff, i.e. difference between the array and shifted array, and find
all islands of zeros:


In [36]: def get_islands(arr, mask):
    ...:     mask_ = np.concatenate(( [False], mask, [False] ))
    ...:     idx = np.flatnonzero(mask_ [1:] != mask_ [:-1])
    ...:     return [arr[idx[i]:idx[i+1] + 1] for i in range(0, len(idx), 2)]
    ...:
    ...:
    ...: get_islands(seq, np.r_[np.diff(seq) == 0, False])



RE: Longest sequence of repeating integers in a numpy array - Cricri - Jun-07-2020

Hello scidam,

First of all, thank you very much for taking the time to reply. I really appreciate it.

I have no idea how your proposed algorithm works, so I'll just go away and try things out until I understand it. I will come back when I have done my homework.

Kind regards

c

(Jun-07-2020, 12:24 AM)scidam Wrote: You can use np.diff, i.e. difference between the array and shifted array, and find
all islands of zeros:


In [36]: def get_islands(arr, mask):
    ...:     mask_ = np.concatenate(( [False], mask, [False] ))
    ...:     idx = np.flatnonzero(mask_ [1:] != mask_ [:-1])
    ...:     return [arr[idx[i]:idx[i+1] + 1] for i in range(0, len(idx), 2)]
    ...:
    ...:
    ...: get_islands(seq, np.r_[np.diff(seq) == 0, False])

scidam,

Thank you very much. I am struggling to understand the implementation but I get the shifting and comparison of differences resulting in zeros when integers are repeated. I will spend more time on it to get a better understanding and be able to reproduce it in different circumstances.

In a run, I got the following results:

[array([1, 1]),
array([3, 3]),
array([4, 4]),
array([2, 2]),
array([6, 6]),
array([2, 2]),
array([6, 6]),
array([4, 4]),
array([2, 2, 2]),
array([2, 2, 2]),
array([4, 4]),
array([6, 6]),
array([5, 5]),
array([3, 3, 3, 3])]

If that is not asking too much, how would I alter the code so that it returns only the longest sequence (in this case, the last entry array([3, 3, 3, 3])?

Thank you.


RE: Longest sequence of repeating integers in a numpy array - Cricri - Jun-07-2020

I don't seem to be able to edit a post. Apologies for bundling the end [python] tag in my initial message. Also meant to say in last message "only the longest sequences", plural. My example only generated one longest repeating sequence of 4 digits, but were there two or more of these longest repeating sequences, I would like to retain them and only them in the output.

Thank you.


RE: Longest sequence of repeating integers in a numpy array - bowlofred - Jun-07-2020

Here's an alternate way

import random
import string

string = "".join((random.choice('1234567')) for x in range(100))

max_string_length = 1
max_string_members = []
current_string_member = ""
current_string_length = 0

for digit in string:
    if digit != current_string_member:
        current_string_member = digit
        current_string_length = 1
    else:
        current_string_length += 1
    if current_string_length == max_string_length:
        max_string_members.append(digit)
    if current_string_length > max_string_length:
        max_string_members = [digit]
        max_string_length = current_string_length

print(f"The longest sequence found was {max_string_length}")
print(f"The number of times this length was seen was {len(max_string_members)}")
print(max_string_members)
print(string)



RE: Longest sequence of repeating integers in a numpy array - Cricri - Jun-08-2020

(Jun-07-2020, 05:51 PM)bowlofred Wrote: Here's an alternate way

Sorry about the late reply. I didn't get a notification.

This does the job fine and for scidam's, I will take it apart to learn how it is built so I can reproduce it elsewhere. Thank you very much for taking the time to answer.