Python Forum

I want to find the autocorrelation of a series of numbers with lag 1,2,...N

In this example I have 56 numbers and N = 20.

So I have a file with the first 56 columns that are random numbers.
From column 57 to column 77, I have the autocorrelation with lag 1,2...20.
Let's say I only load in column 1-56. In this example, I only have one row, but it can be several rows as well.

The goal is thus to get this result:

[1]: https://i.stack.imgur.com/MOh6H.png

I now did it in excel, but how can I do this in Python? And also: what if I have several rows; how can I do it on multiple rows?

I found this code and this is a good start, I think.

    import numpy
    def acf(series):
        n = len(series)
        data = numpy.asarray(series)
        mean = numpy.mean(data)
        c0 = numpy.sum((data - mean) ** 2) / float(n)

    def r(h):
        acf_lag = ((data[:n - h] - mean) * (data[h:] - mean)).sum() / float(n) / 
        c0
        return round(acf_lag, 3)
    x = numpy.arange(n) # Avoiding lag 0 calculation
    acf_coeffs = map(r, x)
    return acf_coeffs

In the first function is 'n' the number of numbers; this is in my example '56' and the variable "data" is simply the first 56 columns. c0 is the denominator of the autocorrelation formula.

Now, I see that h is the number of lags. So in my example 1 to 20.
acf_lag is the actual formula of autocorrelation, so that looks okay.

The thing I'm struggling with is: how can I write the line of code where I apply this to my dataframe so I get the result I want??

Appreciating every help!

Update:

I tried to not define the function "acf", but to declare it all as a seperate variable:

    n = len(df.columns)-20
    data = df.iloc[:,:56]
    mean = data.mean(axis=1)
    c0 = np.sum((data - mean) ** 2) / float(n)

Maybe I can define the r(h) function afterwards. But c0 gives me a zero for every 56 columns, while it should extract the mean from every column (1 to 56). This is not happening.

StevenZut