Posts: 53
Threads: 25
Joined: Jan 2018
I would like to create a function whereby it goes through each row in a particular dataframe column, say X and if the same number appears consecutively for 5 times or more, it will return the value in the corresponding column, say Y. I am working with a timeseries data.
for example the data would look like this.
X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]
so in this case it should return 4 and 6 and I would also need to sum up the occurrences of 4 and 6 if this kind of pattern repeats again throughout the time series. so the final output would be as below
Y Count
4 1
6 1
thank you.
Posts: 4,220
Threads: 97
Joined: Sep 2016
Posts: 53
Threads: 25
Joined: Jan 2018
(May-19-2019, 02:03 PM)ichabod801 Wrote: What have you tried?
I have tried something like below but I am stuck at how to return the corresponding value
X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]
count_sequence = [sum(1 for _ in group) for _, group in groupby(X)]
print(count_sequence)
for i in count_sequence:
if i>= 5:
print(Y[i]) #not sure if this is correct
Posts: 95
Threads: 3
Joined: May 2019
May-19-2019, 08:15 PM
(This post was last modified: May-19-2019, 08:54 PM by michalmonday.)
(May-19-2019, 12:46 PM)python_newbie09 Wrote: X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]
Aren't 4's in the Y supposed to be 5's because there are 5 occurences of 2's in the X ?
(May-19-2019, 12:46 PM)python_newbie09 Wrote: so in this case it should return 4 and 6 and I would also need to sum up the occurrences of 4 and 6 if this kind of pattern repeats again throughout the time series. so the final output would be as below
Y Count
4 1
6 1
I fail to grasp what is the origin of 4 and 6 in this final output and how their Count is 1.
If the origin of 4 and 6 in the final output is based on the Y then I just don't get it. In the example you posted the Y consists of 5 occurences of 4, and 5 occurences of 5. No 6's to be seen.
If the origin of 4 and 6 in the final output is based on the X then it is kinda understandable what 6 is taken from but then why 4 is also there instead of 2?
______________________________
You mentioned "consecutively", should the function check whether the repetitions are consecutive?
Or X will never actually contain this kind of sequence:
X = [1,1,1,1,1,1,2,1,1,3] # where 1's are separated by "2" or other numbers
If X will actually contain such sequence then is the output below correct?
Y = [6,6,6,6,6,6,0,0,0,0]
In case if the X values don't have to be checked for consecutiveness then you could use something like this:
import numpy as np
X = np.array([2,2,2,2,2,3,4,5,6,6,6,6,6,7,8])
counts = np.array([(X==i).sum() for i in X])
# np.where 2nd and 3rd arguments can be single values or arrays
# the return values sometimes are taken from 2nd and sometimes are taken
# from 3rd array/value, depending on whether the condition from 1st argument
# is met
Y = np.where(counts < 5, 0, counts)
print('Y =\n', Y)
print('np.unique =\n', np.unique(X, return_counts=True)) Output: Y =
[5 5 5 5 5 0 0 0 5 5 5 5 5 0 0]
np.unique =
(array([2, 3, 4, 5, 6, 7, 8]), array([5, 1, 1, 1, 5, 1, 1]))
Posts: 53
Threads: 25
Joined: Jan 2018
(May-19-2019, 08:15 PM)michalmonday Wrote: (May-19-2019, 12:46 PM)python_newbie09 Wrote: X = [2,2,2,2,2,3,4,5,6,6,6,6,6,7,8]
Y = [4,4,4,4,4,0,0,0,5,5,5,5,5,0,0]
Aren't 4's in the Y supposed to be 5's because there are 5 occurences of 2's in the X ?
(May-19-2019, 12:46 PM)python_newbie09 Wrote: so in this case it should return 4 and 6 and I would also need to sum up the occurrences of 4 and 6 if this kind of pattern repeats again throughout the time series. so the final output would be as below
Y Count
4 1
6 1
I fail to grasp what is the origin of 4 and 6 in this final output and how their Count is 1.
If the origin of 4 and 6 in the final output is based on the Y then I just don't get it. In the example you posted the Y consists of 5 occurences of 4, and 5 occurences of 5. No 6's to be seen.
If the origin of 4 and 6 in the final output is based on the X then it is kinda understandable what 6 is taken from but then why 4 is also there instead of 2?
______________________________
You mentioned "consecutively", should the function check whether the repetitions are consecutive?
Or X will never actually contain this kind of sequence:
X = [1,1,1,1,1,1,2,1,1,3] # where 1's are separated by "2" or other numbers
If X will actually contain such sequence then is the output below correct?
Y = [6,6,6,6,6,6,0,0,0,0]
In case if the X values don't have to be checked for consecutiveness then you could use something like this:
import numpy as np
X = np.array([2,2,2,2,2,3,4,5,6,6,6,6,6,7,8])
counts = np.array([(X==i).sum() for i in X])
# np.where 2nd and 3rd arguments can be single values or arrays
# the return values sometimes are taken from 2nd and sometimes are taken
# from 3rd array/value, depending on whether the condition from 1st argument
# is met
Y = np.where(counts < 5, 0, counts)
print('Y =\n', Y)
print('np.unique =\n', np.unique(X, return_counts=True)) Output: Y =
[5 5 5 5 5 0 0 0 5 5 5 5 5 0 0]
np.unique =
(array([2, 3, 4, 5, 6, 7, 8]), array([5, 1, 1, 1, 5, 1, 1]))
sorry for the confusion. I want to access the value in Y when the numbers in X are repeated consecutively. so i may have a time series as below:
X = [5,5,5,5,5,0,0,0,0,0,4,4,4,4,4,1,2,3,6,6,6,6,5,5,5,5,5,2,4,6,7]
Y = [1,1,1,1,1,0,0,0,0,0,3,3,3,3,3,1,2,3,4,6,7,8,5,5,5,5,5,2,4,6,7]
so, i have to count in X if the number repeats itself 5 times or more, then return the value that is showing in Y, so for example the number 5 repeats itself 5 times, so it will then print the output in Y only once with the value 1. Repetitions of 0 should be excluded. so the final output would display as:
[1,3, 5] as number 5 and 4 and 5 again repeated more than 5 times in X
Posts: 95
Threads: 3
Joined: May 2019
May-20-2019, 05:28 PM
(This post was last modified: May-20-2019, 05:29 PM by michalmonday.)
Now I'm even more confused to be honest.
So
X = [5,5,5,5,5]
results in:
Y = [1,1,1,1,1]
but
X = [4,4,4,4,4]
results in
Y = [3,3,3,3,3]
Another thing that is confusing me is that:
X = [1,2,3]
results in
Y = [1,2,3]
but
X = [6,6,6]
results in
Y = [4,6,7]
And in the first example it used to be:
X = [3,4,5]
resulting in
Y = [0,0,0]
What's the logic behind it?
Posts: 53
Threads: 25
Joined: Jan 2018
(May-20-2019, 05:28 PM)michalmonday Wrote: Now I'm even more confused to be honest.
So
X = [5,5,5,5,5]
results in:
Y = [1,1,1,1,1]
but
X = [4,4,4,4,4]
results in
Y = [3,3,3,3,3]
Another thing that is confusing me is that:
X = [1,2,3]
results in
Y = [1,2,3]
but
X = [6,6,6]
results in
Y = [4,6,7]
And in the first example it used to be:
X = [3,4,5]
resulting in
Y = [0,0,0]
What's the logic behind it?
i suggest to stick the latest sample data that I showed. basically i am observing data from a machine which will spit out numbers in X and when these numbers repeat themselves, it means something is wrong and it will then spit out the failure information in Y so that is why I need to know what is Y when X has repeated occurrences because the failure description is tied to the number being shown in Y. The reason why I need the information only once is because the failure in Y can also repeat itself at a later timepoint so even if I used the groupby method for Y, it will not separate these occurrences, for example Y may have [5,5,5,5,5,1,2,3,5,5,5,5,5,0,1,3,3,3,3,3] so I need to know that failure 5 occurred twice in this time series and not the sum of it. I hope this is clear and thanks for your patience.
|