count occurrence of numbers in a sequence and return corresponding value

python_newbie09 · May-19-2019, 12:46 PM

I would like to create a function whereby it goes through each row in a particular dataframe column, say X and if the same number appears consecutively for 5 times or more, it will return the value in the corresponding column, say Y. I am working with a timeseries data.
for example the data would look like this.
X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]

so in this case it should return 4 and 6 and I would also need to sum up the occurrences of 4 and 6 if this kind of pattern repeats again throughout the time series. so the final output would be as below
Y Count
4 1
6 1

thank you.

***ichabod801*** · May-19-2019, 02:03 PM

What have you tried?

python_newbie09 · May-19-2019, 06:43 PM

(May-19-2019, 02:03 PM)ichabod801 Wrote: What have you tried?

I have tried something like below but I am stuck at how to return the corresponding value
X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]

count_sequence = [sum(1 for _ in group) for _, group in groupby(X)]
print(count_sequence)
for i in count_sequence:
    if i>= 5:
        print(Y[i]) #not sure if this is correct

michalmonday · (This post was last modified: May-19-2019, 08:54 PM by michalmonday.)

(May-19-2019, 12:46 PM)python_newbie09 Wrote: X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]

Aren't 4's in the Y supposed to be 5's because there are 5 occurences of 2's in the X?

(May-19-2019, 12:46 PM)python_newbie09 Wrote: so in this case it should return 4 and 6 and I would also need to sum up the occurrences of 4 and 6 if this kind of pattern repeats again throughout the time series. so the final output would be as below
Y Count
4 1
6 1

I fail to grasp what is the origin of 4 and 6 in this final output and how their Count is 1.

If the origin of 4 and 6 in the final output is based on the Y then I just don't get it. In the example you posted the Y consists of 5 occurences of 4, and 5 occurences of 5. No 6's to be seen.

If the origin of 4 and 6 in the final output is based on the X then it is kinda understandable what 6 is taken from but then why 4 is also there instead of 2?

______________________________

You mentioned "consecutively", should the function check whether the repetitions are consecutive?
Or X will never actually contain this kind of sequence:
X = [1,1,1,1,1,1,2,1,1,3] # where 1's are separated by "2" or other numbers

If X will actually contain such sequence then is the output below correct?
Y = [6,6,6,6,6,6,0,0,0,0]

In case if the X values don't have to be checked for consecutiveness then you could use something like this:

import numpy as np

X = np.array([2,2,2,2,2,3,4,5,6,6,6,6,6,7,8])

counts = np.array([(X==i).sum() for i in X])

# np.where 2nd and 3rd arguments can be single values or arrays
# the return values sometimes are taken from 2nd and sometimes are taken
# from 3rd array/value, depending on whether the condition from 1st argument
# is met
Y = np.where(counts < 5, 0, counts)
print('Y =\n', Y)

print('np.unique =\n', np.unique(X, return_counts=True))

Output:Y =
 [5 5 5 5 5 0 0 0 5 5 5 5 5 0 0]
np.unique =
 (array([2, 3, 4, 5, 6, 7, 8]), array([5, 1, 1, 1, 5, 1, 1]))

python_newbie09 · May-20-2019, 04:43 PM

(May-19-2019, 08:15 PM)michalmonday Wrote:
(May-19-2019, 12:46 PM)python_newbie09 Wrote: X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]

Aren't 4's in the Y supposed to be 5's because there are 5 occurences of 2's in the X?

(May-19-2019, 12:46 PM)python_newbie09 Wrote: so in this case it should return 4 and 6 and I would also need to sum up the occurrences of 4 and 6 if this kind of pattern repeats again throughout the time series. so the final output would be as below
Y Count
4 1
6 1

I fail to grasp what is the origin of 4 and 6 in this final output and how their Count is 1.

If the origin of 4 and 6 in the final output is based on the Y then I just don't get it. In the example you posted the Y consists of 5 occurences of 4, and 5 occurences of 5. No 6's to be seen.

If the origin of 4 and 6 in the final output is based on the X then it is kinda understandable what 6 is taken from but then why 4 is also there instead of 2?

______________________________

You mentioned "consecutively", should the function check whether the repetitions are consecutive?
Or X will never actually contain this kind of sequence:
X = [1,1,1,1,1,1,2,1,1,3] # where 1's are separated by "2" or other numbers

If X will actually contain such sequence then is the output below correct?
Y = [6,6,6,6,6,6,0,0,0,0]

In case if the X values don't have to be checked for consecutiveness then you could use something like this:
import numpy as np

X = np.array([2,2,2,2,2,3,4,5,6,6,6,6,6,7,8])

counts = np.array([(X==i).sum() for i in X])

# np.where 2nd and 3rd arguments can be single values or arrays
# the return values sometimes are taken from 2nd and sometimes are taken
# from 3rd array/value, depending on whether the condition from 1st argument
# is met
Y = np.where(counts < 5, 0, counts)
print('Y =\n', Y)

print('np.unique =\n', np.unique(X, return_counts=True))
Output:Y =
 [5 5 5 5 5 0 0 0 5 5 5 5 5 0 0]
np.unique =
 (array([2, 3, 4, 5, 6, 7, 8]), array([5, 1, 1, 1, 5, 1, 1]))

sorry for the confusion. I want to access the value in Y when the numbers in X are repeated consecutively. so i may have a time series as below:
X = [5,5,5,5,5,0,0,0,0,0,4,4,4,4,4,1,2,3,6,6,6,6,5,5,5,5,5,2,4,6,7]
Y = [1,1,1,1,1,0,0,0,0,0,3,3,3,3,3,1,2,3,4,6,7,8,5,5,5,5,5,2,4,6,7]

so, i have to count in X if the number repeats itself 5 times or more, then return the value that is showing in Y, so for example the number 5 repeats itself 5 times, so it will then print the output in Y only once with the value 1. Repetitions of 0 should be excluded. so the final output would display as:
[1,3, 5] as number 5 and 4 and 5 again repeated more than 5 times in X

michalmonday · (This post was last modified: May-20-2019, 05:29 PM by michalmonday.)

Now I'm even more confused to be honest.

So
X = [5,5,5,5,5]
results in:
Y = [1,1,1,1,1]

but
X = [4,4,4,4,4]
results in
Y = [3,3,3,3,3]

Another thing that is confusing me is that:
X = [1,2,3]
results in
Y = [1,2,3]

but
X = [6,6,6]
results in
Y = [4,6,7]

And in the first example it used to be:
X = [3,4,5]
resulting in
Y = [0,0,0]

What's the logic behind it?

python_newbie09 · May-20-2019, 06:33 PM

(May-20-2019, 05:28 PM)michalmonday Wrote: Now I'm even more confused to be honest.

So
X = [5,5,5,5,5]
results in:
Y = [1,1,1,1,1]

but
X = [4,4,4,4,4]
results in
Y = [3,3,3,3,3]

Another thing that is confusing me is that:
X = [1,2,3]
results in
Y = [1,2,3]

but
X = [6,6,6]
results in
Y = [4,6,7]

And in the first example it used to be:
X = [3,4,5]
resulting in
Y = [0,0,0]

What's the logic behind it?

i suggest to stick the latest sample data that I showed. basically i am observing data from a machine which will spit out numbers in X and when these numbers repeat themselves, it means something is wrong and it will then spit out the failure information in Y so that is why I need to know what is Y when X has repeated occurrences because the failure description is tied to the number being shown in Y. The reason why I need the information only once is because the failure in Y can also repeat itself at a later timepoint so even if I used the groupby method for Y, it will not separate these occurrences, for example Y may have [5,5,5,5,5,1,2,3,5,5,5,5,5,0,1,3,3,3,3,3] so I need to know that failure 5 occurred twice in this time series and not the sum of it. I hope this is clear and thanks for your patience.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How do I calculate a ratio from 2 numbers and return an equivalent list of about 1000	Pleiades	8	21,735	Jan-05-2024, 08:30 PM Last Post: sgrey
	Row Count and coloumn count	Yegor123	4	3,073	Oct-18-2022, 03:52 AM Last Post: Yegor123
	How to get unique entries in a list and the count of occurrence	james2009	5	4,406	May-08-2022, 04:34 AM Last Post: ndc85430
	Selecting the first occurrence of a duplicate	knight2000	8	8,064	May-25-2021, 01:37 AM Last Post: knight2000
	How can I found how many numbers are there in a Collatz Sequence that I found?	cananb	2	3,667	Nov-23-2020, 05:15 PM Last Post: cananb
	Checking for one or more occurrence in a list	menator01	3	3,554	May-18-2020, 06:44 AM Last Post: DPaul
	Return prime numbers from range	krzyfigh	2	2,961	Apr-20-2020, 08:08 PM Last Post: krzyfigh
	Define a range, return all numbers of range that are NOT in csv data	KiNeMs	18	10,628	Jan-24-2020, 06:19 AM Last Post: KiNeMs
	How to count and order numbers in a list	rachyioli	2	3,468	Aug-21-2019, 10:51 AM Last Post: perfringo
	Print Numbers starting at 1 vertically with separator for output numbers	Pleiades	3	5,267	May-09-2019, 12:19 PM Last Post: Pleiades

count occurrence of numbers in a sequence and return corresponding value

User Panel Messages

Announcements