Python Forum
count occurrence of numbers in a sequence and return corresponding value
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
count occurrence of numbers in a sequence and return corresponding value
#1
I would like to create a function whereby it goes through each row in a particular dataframe column, say X and if the same number appears consecutively for 5 times or more, it will return the value in the corresponding column, say Y. I am working with a timeseries data.
for example the data would look like this.
X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]

so in this case it should return 4 and 6 and I would also need to sum up the occurrences of 4 and 6 if this kind of pattern repeats again throughout the time series. so the final output would be as below
Y Count
4 1
6 1

thank you.
Reply
#2
What have you tried?
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
(May-19-2019, 02:03 PM)ichabod801 Wrote: What have you tried?

I have tried something like below but I am stuck at how to return the corresponding value
X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]
count_sequence = [sum(1 for _ in group) for _, group in groupby(X)]
print(count_sequence)
for i in count_sequence:
    if i>= 5:
        print(Y[i]) #not sure if this is correct
 
Reply
#4
(May-19-2019, 12:46 PM)python_newbie09 Wrote: X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]

Aren't 4's in the Y supposed to be 5's because there are 5 occurences of 2's in the X?

(May-19-2019, 12:46 PM)python_newbie09 Wrote: so in this case it should return 4 and 6 and I would also need to sum up the occurrences of 4 and 6 if this kind of pattern repeats again throughout the time series. so the final output would be as below
Y Count
4 1
6 1

I fail to grasp what is the origin of 4 and 6 in this final output and how their Count is 1.

If the origin of 4 and 6 in the final output is based on the Y then I just don't get it. In the example you posted the Y consists of 5 occurences of 4, and 5 occurences of 5. No 6's to be seen.

If the origin of 4 and 6 in the final output is based on the X then it is kinda understandable what 6 is taken from but then why 4 is also there instead of 2?

______________________________

You mentioned "consecutively", should the function check whether the repetitions are consecutive?
Or X will never actually contain this kind of sequence:
X = [1,1,1,1,1,1,2,1,1,3] # where 1's are separated by "2" or other numbers

If X will actually contain such sequence then is the output below correct?
Y = [6,6,6,6,6,6,0,0,0,0]

In case if the X values don't have to be checked for consecutiveness then you could use something like this:

import numpy as np

X = np.array([2,2,2,2,2,3,4,5,6,6,6,6,6,7,8])

counts = np.array([(X==i).sum() for i in X])

# np.where 2nd and 3rd arguments can be single values or arrays
# the return values sometimes are taken from 2nd and sometimes are taken
# from 3rd array/value, depending on whether the condition from 1st argument
# is met
Y = np.where(counts < 5, 0, counts)
print('Y =\n', Y)

print('np.unique =\n', np.unique(X, return_counts=True))
Output:
Y = [5 5 5 5 5 0 0 0 5 5 5 5 5 0 0] np.unique = (array([2, 3, 4, 5, 6, 7, 8]), array([5, 1, 1, 1, 5, 1, 1]))
Reply
#5
(May-19-2019, 08:15 PM)michalmonday Wrote:
(May-19-2019, 12:46 PM)python_newbie09 Wrote: X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]

Aren't 4's in the Y supposed to be 5's because there are 5 occurences of 2's in the X?

(May-19-2019, 12:46 PM)python_newbie09 Wrote: so in this case it should return 4 and 6 and I would also need to sum up the occurrences of 4 and 6 if this kind of pattern repeats again throughout the time series. so the final output would be as below
Y Count
4 1
6 1

I fail to grasp what is the origin of 4 and 6 in this final output and how their Count is 1.

If the origin of 4 and 6 in the final output is based on the Y then I just don't get it. In the example you posted the Y consists of 5 occurences of 4, and 5 occurences of 5. No 6's to be seen.

If the origin of 4 and 6 in the final output is based on the X then it is kinda understandable what 6 is taken from but then why 4 is also there instead of 2?

______________________________

You mentioned "consecutively", should the function check whether the repetitions are consecutive?
Or X will never actually contain this kind of sequence:
X = [1,1,1,1,1,1,2,1,1,3] # where 1's are separated by "2" or other numbers

If X will actually contain such sequence then is the output below correct?
Y = [6,6,6,6,6,6,0,0,0,0]

In case if the X values don't have to be checked for consecutiveness then you could use something like this:

import numpy as np

X = np.array([2,2,2,2,2,3,4,5,6,6,6,6,6,7,8])

counts = np.array([(X==i).sum() for i in X])

# np.where 2nd and 3rd arguments can be single values or arrays
# the return values sometimes are taken from 2nd and sometimes are taken
# from 3rd array/value, depending on whether the condition from 1st argument
# is met
Y = np.where(counts < 5, 0, counts)
print('Y =\n', Y)

print('np.unique =\n', np.unique(X, return_counts=True))
Output:
Y = [5 5 5 5 5 0 0 0 5 5 5 5 5 0 0] np.unique = (array([2, 3, 4, 5, 6, 7, 8]), array([5, 1, 1, 1, 5, 1, 1]))

sorry for the confusion. I want to access the value in Y when the numbers in X are repeated consecutively. so i may have a time series as below:
X = [5,5,5,5,5,0,0,0,0,0,4,4,4,4,4,1,2,3,6,6,6,6,5,5,5,5,5,2,4,6,7]
Y = [1,1,1,1,1,0,0,0,0,0,3,3,3,3,3,1,2,3,4,6,7,8,5,5,5,5,5,2,4,6,7]

so, i have to count in X if the number repeats itself 5 times or more, then return the value that is showing in Y, so for example the number 5 repeats itself 5 times, so it will then print the output in Y only once with the value 1. Repetitions of 0 should be excluded. so the final output would display as:
[1,3, 5] as number 5 and 4 and 5 again repeated more than 5 times in X
Reply
#6
Now I'm even more confused to be honest.

So
X = [5,5,5,5,5]
results in:
Y = [1,1,1,1,1]

but
X = [4,4,4,4,4]
results in
Y = [3,3,3,3,3]

Another thing that is confusing me is that:
X = [1,2,3]
results in
Y = [1,2,3]

but
X = [6,6,6]
results in
Y = [4,6,7]

And in the first example it used to be:
X = [3,4,5]
resulting in
Y = [0,0,0]


What's the logic behind it?
Reply
#7
(May-20-2019, 05:28 PM)michalmonday Wrote: Now I'm even more confused to be honest.

So
X = [5,5,5,5,5]
results in:
Y = [1,1,1,1,1]

but
X = [4,4,4,4,4]
results in
Y = [3,3,3,3,3]

Another thing that is confusing me is that:
X = [1,2,3]
results in
Y = [1,2,3]

but
X = [6,6,6]
results in
Y = [4,6,7]

And in the first example it used to be:
X = [3,4,5]
resulting in
Y = [0,0,0]


What's the logic behind it?

i suggest to stick the latest sample data that I showed. basically i am observing data from a machine which will spit out numbers in X and when these numbers repeat themselves, it means something is wrong and it will then spit out the failure information in Y so that is why I need to know what is Y when X has repeated occurrences because the failure description is tied to the number being shown in Y. The reason why I need the information only once is because the failure in Y can also repeat itself at a later timepoint so even if I used the groupby method for Y, it will not separate these occurrences, for example Y may have [5,5,5,5,5,1,2,3,5,5,5,5,5,0,1,3,3,3,3,3] so I need to know that failure 5 occurred twice in this time series and not the sum of it. I hope this is clear and thanks for your patience.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How do I calculate a ratio from 2 numbers and return an equivalent list of about 1000 Pleiades 8 15,616 Jan-05-2024, 08:30 PM
Last Post: sgrey
  Row Count and coloumn count Yegor123 4 1,306 Oct-18-2022, 03:52 AM
Last Post: Yegor123
  How to get unique entries in a list and the count of occurrence james2009 5 2,957 May-08-2022, 04:34 AM
Last Post: ndc85430
  Selecting the first occurrence of a duplicate knight2000 8 5,173 May-25-2021, 01:37 AM
Last Post: knight2000
  How can I found how many numbers are there in a Collatz Sequence that I found? cananb 2 2,524 Nov-23-2020, 05:15 PM
Last Post: cananb
  Checking for one or more occurrence in a list menator01 3 2,671 May-18-2020, 06:44 AM
Last Post: DPaul
  Return prime numbers from range krzyfigh 2 1,916 Apr-20-2020, 08:08 PM
Last Post: krzyfigh
  Define a range, return all numbers of range that are NOT in csv data KiNeMs 18 7,006 Jan-24-2020, 06:19 AM
Last Post: KiNeMs
  How to count and order numbers in a list rachyioli 2 2,541 Aug-21-2019, 10:51 AM
Last Post: perfringo
  Print Numbers starting at 1 vertically with separator for output numbers Pleiades 3 3,707 May-09-2019, 12:19 PM
Last Post: Pleiades

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020