Posts: 1,358
Threads: 2
Joined: May 2019
You probably only need degree=2, but little matter.
Look at this in the scikit-learn docs - https://scikit-learn.org/stable/auto_exa...t_ols.html
How good it is can be assessed as the mean squared error and also how well it predicts the validation set.
Posts: 9
Threads: 1
Joined: Mar 2020
Mar-10-2020, 02:15 PM
(This post was last modified: Mar-10-2020, 02:15 PM by mates.)
Hey Jeff,
Im was looking at that link, so I tried to do simular thing with my polynomial regression but i got stuck. Can you maybe give me a hint if Im doing it right ?
Is my code for polynomial regression good ?
array_train = train_dataset.values
y_train = array[:,1].reshape(-1, 1)
X_train = array[:,0].reshape(-1, 1)
array_test = test_dataset.values
y_test = array[:,1].reshape(-1, 1)
X_test = array[:,0].reshape(-1, 1)
poly = PolynomialFeatures(degree = 4)
X_poly_train = poly.fit_transform(X_train)
poly.fit(X_poly, y_train)
lin2 = LinearRegression()
lin2.fit(X_poly_train, y_train)
plt.scatter(X_train, y_train, color = 'blue')
plt.plot(X_train, lin2.predict(poly.fit_transform(X)), color = 'red')
plt.title('Polynomial Regression')
plt.show()
y_pred = lin2.predict(X_test)
print (y_pred)
print('Coefficient of determination: %.2f'
% r2_score(y_test, y_pred)) Error: ValueError Traceback (most recent call last)
<ipython-input-10-df595795834c> in <module>
148 plt.show()
149
--> 150 y_pred = lin2.predict(X_test)
151
152 print (y_pred)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\linear_model\base.py in predict(self, X)
219 Returns predicted values.
220 """
--> 221 return self._decision_function(X)
222
223 _preprocess_data = staticmethod(_preprocess_data)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\linear_model\base.py in _decision_function(self, X)
204 X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
205 return safe_sparse_dot(X, self.coef_.T,
--> 206 dense_output=True) + self.intercept_
207
208 def predict(self, X):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\extmath.py in safe_sparse_dot(a, b, dense_output)
140 return ret
141 else:
--> 142 return np.dot(a, b)
143
144
ValueError: shapes (29,1) and (5,1) not aligned: 1 (dim 1) != 5 (dim 0)
Posts: 1,358
Threads: 2
Joined: May 2019
I'm confused regarding your datasets.
Assuming you are starting with a dataset with two columns, one for year and the other for population, and let's make those the column names
You split the dataset into
test_set
valid_set
train_set
Try this. I can't test without your full code and csv, but this should get you close.
poly = PolynomialFeatures(2)
X = train_set['year']
y = train_set['population']
poly.fit_transform(X)
lm = linear_model.LinearRegression()
lm.fit(X, y)
Posts: 9
Threads: 1
Joined: Mar 2020
Im sorry, i will upload my csv file and my python file so you can check. I tried to do the polynomial regression as you suggested. In the file you can see how far I´ve come.
Python_file
csv_file
Posts: 1,358
Threads: 2
Joined: May 2019
Spent some time on this tonight. Note that in my version I read the csv differently, from a different source (my Google drive). It still has errors, but the plot at the end shows the modeled values - next step is to plot the actual vs the modeled, and do stats on them if you want
# Load libraries
from pandas import read_csv
from pandas.plotting import scatter_matrix
from matplotlib import pyplot
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# Load dataset
from google.colab import drive
drive.mount('/content/drive')
filename = (r'/content/drive/My Drive/analyza_casovych_radov.csv')
cols = ['Rok', 'Pocet prepravenych cestujucich', ]
dataset = pd.read_csv(filename, names=cols)
df = dataset
trainval_dataset = df.sample(frac=0.8,random_state=42)
test_dataset = df.drop(trainval_dataset.index)
train_dataset = trainval_dataset.sample(frac=0.8, random_state=42)
validate_dataset = trainval_dataset.drop(train_dataset.index)
print ()
print(f"Train {train_dataset.shape} Validate {validate_dataset.shape} Test {test_dataset.shape}")
print ()
print ()
print ('train_dataset= ')
print (train_dataset)
print ()
print ('test_dataset= ')
print (test_dataset)
print ()
print ('validate_dataset= ')
print (validate_dataset)
print()
X = train_dataset['Rok']
y = train_dataset['Pocet prepravenych cestujucich']
poly = PolynomialFeatures(2)
X_poly = poly.fit_transform(X.to_frame().values.reshape(-1, 1))
poly.fit(X_poly, y)
lin2 = LinearRegression()
lin2.fit(X_poly, y)
plt.scatter(X.values, y, color = 'blue')
plt.plot(X.values, lin2.predict(poly.fit_transform(X_poly)), color = 'red')
plt.title('Polynomial Regression')
plt.xlabel('Rok')
plt.ylabel('Other')
plt.show()
#print (poly.fit_transform(X))
#plt.scatter(X, y, color = 'blue')
#plt.plot(X, (poly.fit_transform(X)), color = 'red')
#plt.show() Output: ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-20-b5736948e378> in <module>()
64 plt.scatter(X.values, y, color = 'blue')
65
---> 66 plt.plot(X.values, lin2.predict(poly.fit_transform(X_poly)), color = 'red')
67 plt.title('Polynomial Regression')
68 plt.xlabel('Rok')
2 frames
/usr/local/lib/python3.6/dist-packages/sklearn/utils/extmath.py in safe_sparse_dot(a, b, dense_output)
149 ret = np.dot(a, b)
150 else:
--> 151 ret = a @ b
152
153 if (sparse.issparse(a) and sparse.issparse(b)
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 10)
Posts: 9
Threads: 1
Joined: Mar 2020
Thank you very much Jef for your time :)
|