Python Forum

Full Version: count occurrence of numbers in a sequence and return corresponding value
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I would like to create a function whereby it goes through each row in a particular dataframe column, say X and if the same number appears consecutively for 5 times or more, it will return the value in the corresponding column, say Y. I am working with a timeseries data.
for example the data would look like this.
X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]

so in this case it should return 4 and 6 and I would also need to sum up the occurrences of 4 and 6 if this kind of pattern repeats again throughout the time series. so the final output would be as below
Y Count
4 1
6 1

thank you.
What have you tried?
(May-19-2019, 02:03 PM)ichabod801 Wrote: [ -> ]What have you tried?

I have tried something like below but I am stuck at how to return the corresponding value
X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]
count_sequence = [sum(1 for _ in group) for _, group in groupby(X)]
print(count_sequence)
for i in count_sequence:
    if i>= 5:
        print(Y[i]) #not sure if this is correct
 
(May-19-2019, 12:46 PM)python_newbie09 Wrote: [ -> ]X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]

Aren't 4's in the Y supposed to be 5's because there are 5 occurences of 2's in the X?

(May-19-2019, 12:46 PM)python_newbie09 Wrote: [ -> ]so in this case it should return 4 and 6 and I would also need to sum up the occurrences of 4 and 6 if this kind of pattern repeats again throughout the time series. so the final output would be as below
Y Count
4 1
6 1

I fail to grasp what is the origin of 4 and 6 in this final output and how their Count is 1.

If the origin of 4 and 6 in the final output is based on the Y then I just don't get it. In the example you posted the Y consists of 5 occurences of 4, and 5 occurences of 5. No 6's to be seen.

If the origin of 4 and 6 in the final output is based on the X then it is kinda understandable what 6 is taken from but then why 4 is also there instead of 2?

______________________________

You mentioned "consecutively", should the function check whether the repetitions are consecutive?
Or X will never actually contain this kind of sequence:
X = [1,1,1,1,1,1,2,1,1,3] # where 1's are separated by "2" or other numbers

If X will actually contain such sequence then is the output below correct?
Y = [6,6,6,6,6,6,0,0,0,0]

In case if the X values don't have to be checked for consecutiveness then you could use something like this:

import numpy as np

X = np.array([2,2,2,2,2,3,4,5,6,6,6,6,6,7,8])

counts = np.array([(X==i).sum() for i in X])

# np.where 2nd and 3rd arguments can be single values or arrays
# the return values sometimes are taken from 2nd and sometimes are taken
# from 3rd array/value, depending on whether the condition from 1st argument
# is met
Y = np.where(counts < 5, 0, counts)
print('Y =\n', Y)

print('np.unique =\n', np.unique(X, return_counts=True))
Output:
Y = [5 5 5 5 5 0 0 0 5 5 5 5 5 0 0] np.unique = (array([2, 3, 4, 5, 6, 7, 8]), array([5, 1, 1, 1, 5, 1, 1]))
(May-19-2019, 08:15 PM)michalmonday Wrote: [ -> ]
(May-19-2019, 12:46 PM)python_newbie09 Wrote: [ -> ]X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]

Aren't 4's in the Y supposed to be 5's because there are 5 occurences of 2's in the X?

(May-19-2019, 12:46 PM)python_newbie09 Wrote: [ -> ]so in this case it should return 4 and 6 and I would also need to sum up the occurrences of 4 and 6 if this kind of pattern repeats again throughout the time series. so the final output would be as below
Y Count
4 1
6 1

I fail to grasp what is the origin of 4 and 6 in this final output and how their Count is 1.

If the origin of 4 and 6 in the final output is based on the Y then I just don't get it. In the example you posted the Y consists of 5 occurences of 4, and 5 occurences of 5. No 6's to be seen.

If the origin of 4 and 6 in the final output is based on the X then it is kinda understandable what 6 is taken from but then why 4 is also there instead of 2?

______________________________

You mentioned "consecutively", should the function check whether the repetitions are consecutive?
Or X will never actually contain this kind of sequence:
X = [1,1,1,1,1,1,2,1,1,3] # where 1's are separated by "2" or other numbers

If X will actually contain such sequence then is the output below correct?
Y = [6,6,6,6,6,6,0,0,0,0]

In case if the X values don't have to be checked for consecutiveness then you could use something like this:

import numpy as np

X = np.array([2,2,2,2,2,3,4,5,6,6,6,6,6,7,8])

counts = np.array([(X==i).sum() for i in X])

# np.where 2nd and 3rd arguments can be single values or arrays
# the return values sometimes are taken from 2nd and sometimes are taken
# from 3rd array/value, depending on whether the condition from 1st argument
# is met
Y = np.where(counts < 5, 0, counts)
print('Y =\n', Y)

print('np.unique =\n', np.unique(X, return_counts=True))
Output:
Y = [5 5 5 5 5 0 0 0 5 5 5 5 5 0 0] np.unique = (array([2, 3, 4, 5, 6, 7, 8]), array([5, 1, 1, 1, 5, 1, 1]))

sorry for the confusion. I want to access the value in Y when the numbers in X are repeated consecutively. so i may have a time series as below:
X = [5,5,5,5,5,0,0,0,0,0,4,4,4,4,4,1,2,3,6,6,6,6,5,5,5,5,5,2,4,6,7]
Y = [1,1,1,1,1,0,0,0,0,0,3,3,3,3,3,1,2,3,4,6,7,8,5,5,5,5,5,2,4,6,7]

so, i have to count in X if the number repeats itself 5 times or more, then return the value that is showing in Y, so for example the number 5 repeats itself 5 times, so it will then print the output in Y only once with the value 1. Repetitions of 0 should be excluded. so the final output would display as:
[1,3, 5] as number 5 and 4 and 5 again repeated more than 5 times in X
Now I'm even more confused to be honest.

So
X = [5,5,5,5,5]
results in:
Y = [1,1,1,1,1]

but
X = [4,4,4,4,4]
results in
Y = [3,3,3,3,3]

Another thing that is confusing me is that:
X = [1,2,3]
results in
Y = [1,2,3]

but
X = [6,6,6]
results in
Y = [4,6,7]

And in the first example it used to be:
X = [3,4,5]
resulting in
Y = [0,0,0]


What's the logic behind it?
(May-20-2019, 05:28 PM)michalmonday Wrote: [ -> ]Now I'm even more confused to be honest.

So
X = [5,5,5,5,5]
results in:
Y = [1,1,1,1,1]

but
X = [4,4,4,4,4]
results in
Y = [3,3,3,3,3]

Another thing that is confusing me is that:
X = [1,2,3]
results in
Y = [1,2,3]

but
X = [6,6,6]
results in
Y = [4,6,7]

And in the first example it used to be:
X = [3,4,5]
resulting in
Y = [0,0,0]


What's the logic behind it?

i suggest to stick the latest sample data that I showed. basically i am observing data from a machine which will spit out numbers in X and when these numbers repeat themselves, it means something is wrong and it will then spit out the failure information in Y so that is why I need to know what is Y when X has repeated occurrences because the failure description is tied to the number being shown in Y. The reason why I need the information only once is because the failure in Y can also repeat itself at a later timepoint so even if I used the groupby method for Y, it will not separate these occurrences, for example Y may have [5,5,5,5,5,1,2,3,5,5,5,5,5,0,1,3,3,3,3,3] so I need to know that failure 5 occurred twice in this time series and not the sum of it. I hope this is clear and thanks for your patience.