##### Basic data analysis and predictions
 Basic data analysis and predictions jefsummers Da Bishop Posts: 1,038 Threads: 1 Joined: May 2019 Reputation: Mar-09-2020, 02:01 AM You probably only need degree=2, but little matter. Look at this in the scikit-learn docs - https://scikit-learn.org/stable/auto_exa...t_ols.html How good it is can be assessed as the mean squared error and also how well it predicts the validation set. Reply mates Programmer named Tim Posts: 9 Threads: 1 Joined: Mar 2020 Reputation: Mar-10-2020, 02:15 PM (This post was last modified: Mar-10-2020, 02:15 PM by mates.) Hey Jeff, Im was looking at that link, so I tried to do simular thing with my polynomial regression but i got stuck. Can you maybe give me a hint if Im doing it right ? Is my code for polynomial regression good ? ```array_train = train_dataset.values y_train = array[:,1].reshape(-1, 1) X_train = array[:,0].reshape(-1, 1) array_test = test_dataset.values y_test = array[:,1].reshape(-1, 1) X_test = array[:,0].reshape(-1, 1) poly = PolynomialFeatures(degree = 4) X_poly_train = poly.fit_transform(X_train) poly.fit(X_poly, y_train) lin2 = LinearRegression() lin2.fit(X_poly_train, y_train) plt.scatter(X_train, y_train, color = 'blue') plt.plot(X_train, lin2.predict(poly.fit_transform(X)), color = 'red') plt.title('Polynomial Regression') plt.show() y_pred = lin2.predict(X_test) print (y_pred) print('Coefficient of determination: %.2f' % r2_score(y_test, y_pred))`````````Error:ValueError Traceback (most recent call last) in 148 plt.show() 149 --> 150 y_pred = lin2.predict(X_test) 151 152 print (y_pred) ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\linear_model\base.py in predict(self, X) 219 Returns predicted values. 220 """ --> 221 return self._decision_function(X) 222 223 _preprocess_data = staticmethod(_preprocess_data) ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\linear_model\base.py in _decision_function(self, X) 204 X = check_array(X, accept_sparse=['csr', 'csc', 'coo']) 205 return safe_sparse_dot(X, self.coef_.T, --> 206 dense_output=True) + self.intercept_ 207 208 def predict(self, X): ~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\utils\extmath.py in safe_sparse_dot(a, b, dense_output) 140 return ret 141 else: --> 142 return np.dot(a, b) 143 144 ValueError: shapes (29,1) and (5,1) not aligned: 1 (dim 1) != 5 (dim 0)`````` Reply jefsummers Da Bishop Posts: 1,038 Threads: 1 Joined: May 2019 Reputation: Mar-11-2020, 12:23 AM I'm confused regarding your datasets. Assuming you are starting with a dataset with two columns, one for year and the other for population, and let's make those the column names You split the dataset into test_set valid_set train_set Try this. I can't test without your full code and csv, but this should get you close. ```poly = PolynomialFeatures(2) X = train_set['year'] y = train_set['population'] poly.fit_transform(X) lm = linear_model.LinearRegression() lm.fit(X, y)``` Reply mates Programmer named Tim Posts: 9 Threads: 1 Joined: Mar 2020 Reputation: Mar-11-2020, 09:01 PM Im sorry, i will upload my csv file and my python file so you can check. I tried to do the polynomial regression as you suggested. In the file you can see how far I´ve come. Python_file csv_file Reply jefsummers Da Bishop Posts: 1,038 Threads: 1 Joined: May 2019 Reputation: Mar-12-2020, 12:46 AM Spent some time on this tonight. Note that in my version I read the csv differently, from a different source (my Google drive). It still has errors, but the plot at the end shows the modeled values - next step is to plot the actual vs the modeled, and do stats on them if you want ```# Load libraries from pandas import read_csv from pandas.plotting import scatter_matrix from matplotlib import pyplot from sklearn.model_selection import train_test_split from sklearn.model_selection import cross_val_score from sklearn.model_selection import StratifiedKFold from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC import pandas as pd from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression # Load dataset from google.colab import drive drive.mount('/content/drive') filename = (r'/content/drive/My Drive/analyza_casovych_radov.csv') cols = ['Rok', 'Pocet prepravenych cestujucich', ] dataset = pd.read_csv(filename, names=cols) df = dataset trainval_dataset = df.sample(frac=0.8,random_state=42) test_dataset = df.drop(trainval_dataset.index) train_dataset = trainval_dataset.sample(frac=0.8, random_state=42) validate_dataset = trainval_dataset.drop(train_dataset.index) print () print(f"Train {train_dataset.shape} Validate {validate_dataset.shape} Test {test_dataset.shape}") print () print () print ('train_dataset= ') print (train_dataset) print () print ('test_dataset= ') print (test_dataset) print () print ('validate_dataset= ') print (validate_dataset) print() X = train_dataset['Rok'] y = train_dataset['Pocet prepravenych cestujucich'] poly = PolynomialFeatures(2) X_poly = poly.fit_transform(X.to_frame().values.reshape(-1, 1)) poly.fit(X_poly, y) lin2 = LinearRegression() lin2.fit(X_poly, y) plt.scatter(X.values, y, color = 'blue') plt.plot(X.values, lin2.predict(poly.fit_transform(X_poly)), color = 'red') plt.title('Polynomial Regression') plt.xlabel('Rok') plt.ylabel('Other') plt.show() #print (poly.fit_transform(X)) #plt.scatter(X, y, color = 'blue') #plt.plot(X, (poly.fit_transform(X)), color = 'red') #plt.show()`````````Output:--------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 64 plt.scatter(X.values, y, color = 'blue') 65 ---> 66 plt.plot(X.values, lin2.predict(poly.fit_transform(X_poly)), color = 'red') 67 plt.title('Polynomial Regression') 68 plt.xlabel('Rok') 2 frames /usr/local/lib/python3.6/dist-packages/sklearn/utils/extmath.py in safe_sparse_dot(a, b, dense_output) 149 ret = np.dot(a, b) 150 else: --> 151 ret = a @ b 152 153 if (sparse.issparse(a) and sparse.issparse(b) ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 10)`````` Reply mates Programmer named Tim Posts: 9 Threads: 1 Joined: Mar 2020 Reputation: Mar-14-2020, 09:06 PM Thank you very much Jef for your time :) Reply

 Possibly Related Threads… Thread Author Replies Views Last Post HELP- DATA FRAME INTO TIME SERIES- BASIC bntayfur 0 591 Jul-11-2020, 09:04 PM Last Post: bntayfur How to save predictions made by an autoencoder Glasgow1988 0 472 Jul-03-2020, 12:43 PM Last Post: Glasgow1988 Easy analysis of Data ranjjeetk 1 833 Jun-06-2020, 01:44 AM Last Post: Larz60+ Utilize input predictions for Supervised Learning donnertrud 2 830 May-20-2020, 12:45 PM Last Post: donnertrud complex survey data analysis abeshkc 1 1,062 Nov-06-2019, 06:14 AM Last Post: ThomasL Merge Predictions with whole data set mayanksrivastava 0 2,367 Jun-29-2017, 11:39 AM Last Post: mayanksrivastava

Forum Jump:

### User Panel Messages

##### Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020