Python Forum
Linear Regression on Time Series - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Linear Regression on Time Series (/thread-24010.html)



Linear Regression on Time Series - karlito - Jan-27-2020

Hi,

I'm trying this time to use a simple linear regression on my time series dataset to linearly predict data. But I got this error and I don't know how to handle it. Any ideas?

# print df.head()

                 eie
Date_Time	
2017-11-10	  4470.76
2017-11-11	  5465.72
2017-11-12	  15465.72
2017-11-13	  25465.72
2017-11-14	  21480.59


y = np.array(df.values, dtype=float)
x = np.array(pd.to_datetime(df['eie']).index.values, dtype=float)

slope, intercept, r_value, p_value, std_err =sp.linregress(x,y)

xf = np.linspace(min(x),max(x),100)
xf1 = xf.copy()
xf1 = pd.to_datetime(xf1)
yf = (slope*xf)+intercept

print('r = ', r_value, '\n', 'p = ', p_value, '\n', 's = ', std_err)
# Error
Error:
ValueError Traceback (most recent call last) <ipython-input-13-5d30a02ce6af> in <module> 1 y = np.array(df.values, dtype=float) ----> 2 x = np.array(pd.to_datetime(df['eie']).index.values, dtype=float) 3 4 slope, intercept, r_value, p_value, std_err =sp.linregress(x,y) 5 ValueError: could not convert string to float: '2017-11-10'



RE: Linear Regression on Time Series - buran - Jan-27-2020

The error is clear - string '2017-11-10' could not be converted to float (obviously)


RE: Linear Regression on Time Series - karlito - Jan-27-2020

(Jan-27-2020, 02:22 PM)buran Wrote: The error is clear - string '2017-11-10' could not be converted to float (obviously)

Yes I can read :) but for regression purpose, I read that all dates should be passed through pandas 'to_datetime()' function to convert it to float numeric because corresponding dates will be saved in the 'x' variable.
nb: before setting Date_Time as index it was already converted to 'to_datetime()'. I'm kind of lost. Any ideas?


RE: Linear Regression on Time Series - buran - Jan-27-2020

maybe
import pandas as pd
import numpy as np
df = pd.DataFrame([['2017-11-10', 4470.76], ['2017-11-11', 5465.72], ['2017-11-12', 15465.72]], columns=['Date_Time', 'eie'])
y = np.array(df['eie'], dtype=float)
x = np.array(pd.to_datetime(df['Date_Time'], format='%Y-%m-%d'), dtype=float)
print(y)
print(x)
Output:
[ 4470.76 5465.72 15465.72] [1.5102720e+18 1.5103584e+18 1.5104448e+18]



RE: Linear Regression on Time Series - karlito - Jan-28-2020

(Jan-27-2020, 03:00 PM)buran Wrote: maybe
import pandas as pd
import numpy as np
df = pd.DataFrame([['2017-11-10', 4470.76], ['2017-11-11', 5465.72], ['2017-11-12', 15465.72]], columns=['Date_Time', 'eie'])
y = np.array(df['eie'], dtype=float)
x = np.array(pd.to_datetime(df['Date_Time'], format='%Y-%m-%d'), dtype=float)
print(y)
print(x)
Output:
[ 4470.76 5465.72 15465.72] [1.5102720e+18 1.5103584e+18 1.5104448e+18]

Thanks for your effort but it doesn't really helps me, sorry. I wish something like

Output:
[ 4470.76 5465.72 15465.72] [1 2 3]


or even better

Output:
[ 4470.76 5465.72 15465.72] [2017-11-10 2017-11-11 2017-11-12]



RE: Linear Regression on Time Series - buran - Jan-28-2020

I think there is some confusion in your understanding, but anyway

import pandas as pd
import numpy as np
df = pd.DataFrame([['2017-11-10', 4470.76], ['2017-11-11', 5465.72], ['2017-11-12', 15465.72]], columns=['Date_Time', 'eie'])
y = np.array(df['eie'], dtype=float)
x = np.array(pd.to_datetime(df['Date_Time'], format='%Y-%m-%d'), dtype='datetime64[D]')
print(y)
print(x)
Output:
[ 4470.76 5465.72 15465.72] ['2017-11-10' '2017-11-11' '2017-11-12']
or
import pandas as pd
import numpy as np
df = pd.DataFrame([['2017-11-10', 4470.76], ['2017-11-11', 5465.72], ['2017-11-12', 15465.72]], columns=['Date_Time', 'eie'])
y = np.array(df['eie'], dtype=float)
x = np.array(pd.to_datetime(df['Date_Time'].index.values+1, format='%Y-%m-%d'), dtype=int)
print(y)
print(x)
Output:
[ 4470.76 5465.72 15465.72] [1 2 3]