Dec-06-2018, 08:41 AM
I want to find the autocorrelation of a series of numbers with lag 1,2,...N
In this example I have 56 numbers and N = 20.
So I have a file with the first 56 columns that are random numbers.
From column 57 to column 77, I have the autocorrelation with lag 1,2...20.
Let's say I only load in column 1-56. In this example, I only have one row, but it can be several rows as well.
The goal is thus to get this result:
[1]: https://i.stack.imgur.com/MOh6H.png
I now did it in excel, but how can I do this in Python? And also: what if I have several rows; how can I do it on multiple rows?
I found this code and this is a good start, I think.
Now, I see that h is the number of lags. So in my example 1 to 20.
acf_lag is the actual formula of autocorrelation, so that looks okay.
The thing I'm struggling with is: how can I write the line of code where I apply this to my dataframe so I get the result I want??
Appreciating every help!
Update:
I tried to not define the function "acf", but to declare it all as a seperate variable:
In this example I have 56 numbers and N = 20.
So I have a file with the first 56 columns that are random numbers.
From column 57 to column 77, I have the autocorrelation with lag 1,2...20.
Let's say I only load in column 1-56. In this example, I only have one row, but it can be several rows as well.
The goal is thus to get this result:
[1]: https://i.stack.imgur.com/MOh6H.png
I now did it in excel, but how can I do this in Python? And also: what if I have several rows; how can I do it on multiple rows?
I found this code and this is a good start, I think.
import numpy def acf(series): n = len(series) data = numpy.asarray(series) mean = numpy.mean(data) c0 = numpy.sum((data - mean) ** 2) / float(n) def r(h): acf_lag = ((data[:n - h] - mean) * (data[h:] - mean)).sum() / float(n) / c0 return round(acf_lag, 3) x = numpy.arange(n) # Avoiding lag 0 calculation acf_coeffs = map(r, x) return acf_coeffsIn the first function is 'n' the number of numbers; this is in my example '56' and the variable "data" is simply the first 56 columns. c0 is the denominator of the autocorrelation formula.
Now, I see that h is the number of lags. So in my example 1 to 20.
acf_lag is the actual formula of autocorrelation, so that looks okay.
The thing I'm struggling with is: how can I write the line of code where I apply this to my dataframe so I get the result I want??
Appreciating every help!
Update:
I tried to not define the function "acf", but to declare it all as a seperate variable:
n = len(df.columns)-20 data = df.iloc[:,:56] mean = data.mean(axis=1) c0 = np.sum((data - mean) ** 2) / float(n)Maybe I can define the r(h) function afterwards. But c0 gives me a zero for every 56 columns, while it should extract the mean from every column (1 to 56). This is not happening.