Python Forum
Longest sequence of repeating integers in a numpy array
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Longest sequence of repeating integers in a numpy array
#1
Hello,

Python (coding in general) newbie, so please forgive me if the question is naive/ I did do a search before asking, but I have not found an answer that fits.

I am trying to show than in a random series of integers, it is perfectly possible to find a sequence of repeating integers that does not look random. To do so, I create a numpy array (in a Jupyter notebook)which I populate with a random.randint to simulate dice throws:

[python] seq = np.random.randint(1, 7, size = 100) [python]

The size is set to 100 arbitrarily. I may want to increase or decrease the size of the array.

Where I get stuck is writing the loop to check the longest running sequence of repeating integers. I am also at a loss for deciding how to deal with cases when there are several sequences of repeating integers of the same length that are the longest sequences. I would like to identify the them all in that case. Either way, I would like to determine the length of the longest sequence(s) and the integer(s) concerned.
For example, here is a run generated with the randit():

-----Generated Random Array----
[2 4 3 3 1 3 1 4 4 2 1 6 5 5 5 1 4 6 3 6 1 5 2 6 3 1 1 5 4 1 3 5 1 4 2 2 6
2 3 1 5 3 1 6 4 5 4 6 5 6 5 6 6 5 5 1 4 2 3 3 5 2 5 1 3 4 3 4 6 6 5 6 1 2
3 2 2 3 2 3 1 5 6 3 3 3 5 3 1 5 6 3 2 2 1 1 4 1 4 1]

If I am not mistaken, the longest running sequences of repeating integers are the 555 and 333 I have highlighted. How do I pick them both out programmatically and show both the length of the sequence and the associated integer?

Thank you for your suggestions and your patience.
Reply
#2
You can use np.diff, i.e. difference between the array and shifted array, and find
all islands of zeros:


In [36]: def get_islands(arr, mask):
    ...:     mask_ = np.concatenate(( [False], mask, [False] ))
    ...:     idx = np.flatnonzero(mask_ [1:] != mask_ [:-1])
    ...:     return [arr[idx[i]:idx[i+1] + 1] for i in range(0, len(idx), 2)]
    ...:
    ...:
    ...: get_islands(seq, np.r_[np.diff(seq) == 0, False])
Reply
#3
Hello scidam,

First of all, thank you very much for taking the time to reply. I really appreciate it.

I have no idea how your proposed algorithm works, so I'll just go away and try things out until I understand it. I will come back when I have done my homework.

Kind regards

c

(Jun-07-2020, 12:24 AM)scidam Wrote: You can use np.diff, i.e. difference between the array and shifted array, and find
all islands of zeros:


In [36]: def get_islands(arr, mask):
    ...:     mask_ = np.concatenate(( [False], mask, [False] ))
    ...:     idx = np.flatnonzero(mask_ [1:] != mask_ [:-1])
    ...:     return [arr[idx[i]:idx[i+1] + 1] for i in range(0, len(idx), 2)]
    ...:
    ...:
    ...: get_islands(seq, np.r_[np.diff(seq) == 0, False])

scidam,

Thank you very much. I am struggling to understand the implementation but I get the shifting and comparison of differences resulting in zeros when integers are repeated. I will spend more time on it to get a better understanding and be able to reproduce it in different circumstances.

In a run, I got the following results:

[array([1, 1]),
array([3, 3]),
array([4, 4]),
array([2, 2]),
array([6, 6]),
array([2, 2]),
array([6, 6]),
array([4, 4]),
array([2, 2, 2]),
array([2, 2, 2]),
array([4, 4]),
array([6, 6]),
array([5, 5]),
array([3, 3, 3, 3])]

If that is not asking too much, how would I alter the code so that it returns only the longest sequence (in this case, the last entry array([3, 3, 3, 3])?

Thank you.
Reply
#4
I don't seem to be able to edit a post. Apologies for bundling the end [python] tag in my initial message. Also meant to say in last message "only the longest sequences", plural. My example only generated one longest repeating sequence of 4 digits, but were there two or more of these longest repeating sequences, I would like to retain them and only them in the output.

Thank you.
Reply
#5
Here's an alternate way

import random
import string

string = "".join((random.choice('1234567')) for x in range(100))

max_string_length = 1
max_string_members = []
current_string_member = ""
current_string_length = 0

for digit in string:
    if digit != current_string_member:
        current_string_member = digit
        current_string_length = 1
    else:
        current_string_length += 1
    if current_string_length == max_string_length:
        max_string_members.append(digit)
    if current_string_length > max_string_length:
        max_string_members = [digit]
        max_string_length = current_string_length

print(f"The longest sequence found was {max_string_length}")
print(f"The number of times this length was seen was {len(max_string_members)}")
print(max_string_members)
print(string)
Reply
#6
(Jun-07-2020, 05:51 PM)bowlofred Wrote: Here's an alternate way

Sorry about the late reply. I didn't get a notification.

This does the job fine and for scidam's, I will take it apart to learn how it is built so I can reproduce it elsewhere. Thank you very much for taking the time to answer.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Numpy, array(2d,bool), flipping regions. MvGulik 2 1,041 Oct-27-2024, 11:06 AM
Last Post: MvGulik
  python code to calculate mean of an array of numbers using numpy viren 3 1,249 May-29-2024, 04:49 PM
Last Post: Gribouillis
  Convert numpy array to image without loading it into RAM. DreamingInsanity 7 9,367 Feb-08-2024, 09:38 AM
Last Post: paul18fr
  Why is 2/3 not just .666 repeating? DocFro 4 1,887 Dec-12-2023, 09:09 AM
Last Post: buran
  IPython errors for numpy array min/max methods muelaner 1 1,555 Nov-04-2023, 09:22 PM
Last Post: snippsat
  Python code for Longest Common Subsequence Bolt 3 2,221 Sep-22-2023, 08:09 AM
Last Post: Bolt
  Python implementation of Longest Common Substring problem Bolt 0 1,547 Sep-17-2023, 08:31 PM
Last Post: Bolt
  Expand the range of a NumPy array? PythonNPC 0 1,786 Jan-31-2023, 02:41 AM
Last Post: PythonNPC
  Change a numpy array to a dataframe Led_Zeppelin 3 2,762 Jan-26-2023, 09:01 PM
Last Post: deanhystad
  from numpy array to csv - rounding SchroedingersLion 6 6,391 Nov-14-2022, 09:09 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020