Sep-23-2020, 06:43 PM
When I ran some python 3.73 code I got the following error.
coding fundamentals.
This link gives a good explanation of why this error happens and how to fix. I just do not see how to apply it to my code.
https://stackoverflow.com/questions/6256...ng-sklearn
I cannot attach the NFL*.csv file, it is too big; even zipped it is too big.
I think the solution is in that link. I am not sure how to fix my code though.
Any help appreciated. Thanks in advance.
Respectfully,
ErnestTBass
Error:ValueError Traceback (most recent call last)
<ipython-input-26-6bc623b2ec79> in <module>
3
4 # Fit linear regression
----> 5 lin_reg_mod.fit(X_train, y_train)
6
7 # Make prediction on the testing data
c:\users\newport_j\appdata\local\programs\python\python37\lib\site-packages\sklearn\linear_model\base.py in fit(self, X, y, sample_weight)
501 else:
502 self.coef_, self._residues, self.rank_, self.singular_ = \
--> 503 linalg.lstsq(X, y)
504 self.coef_ = self.coef_.T
505
c:\users\newport_j\appdata\local\programs\python\python37\lib\site-packages\scipy\linalg\basic.py in lstsq(a, b, cond, overwrite_a, overwrite_b, check_finite, lapack_driver)
1219 if info < 0:
1220 raise ValueError('illegal value in %d-th argument of internal %s'
-> 1221 % (-info, lapack_driver))
1222 resids = np.asarray([], dtype=x.dtype)
1223 if m > n:
ValueError: illegal value in 4-th argument of internal None
The code now shown produces it.#!/usr/bin/env python # coding: utf-8 # In[2]: # Used for plotting data get_ipython().run_line_magic('matplotlib', 'inline') import matplotlib.pyplot as plt # Used for data storage and manipulation import numpy as np import pandas as pd # Used for Regression Modelling from sklearn.linear_model import LinearRegression from sklearn import linear_model from sklearn.model_selection import train_test_split # Used for Acc metrics from sklearn.metrics import mean_squared_error from sklearn.metrics import r2_score # For stepwise regression import statsmodels.api as sm # box plots import seaborn as sns # pairplot from seaborn import pairplot # Correlation plot from statsmodels.graphics.correlation import plot_corr # In[3]: # Load your data data = pd.read_csv("NFL data.csv") # In[4]: # adding .head() to your dataset allows you to see the first rows in the dataset. # Add a # inside the brackets to specificy how many rows are returned or else 5 rows are returned. print(data.shape) # (12144, 18) data.head() # In[5]: # check for the null values in each column data.isna().sum() # In[6]: # Gives you useful info about your data data.info() # In[7]: # Gives you summary statistics on your numeric columns data.describe() # In[8]: # return only rows where the year is greater than 2009 current = data[(data['schedule_season'] > 2009)] # In[9]: # no warning message and no exception is raised pd.options.mode.chained_assignment = None # default='warn' # Create a column titled home or away. This column will add a 1 to the row where the New England Patriots played at home # and a 0 for away games. current['home_or_away'] = np.where(current['team_home'] == 'New England Patriots', 1, 0) # In[10]: # Return rows where New England Patriots are either the home or away team current2 = current.loc[(current["team_home"] == "New England Patriots") | (current["team_away"] == "New England Patriots")] # filter to certain columns final = current2.filter(["team_home","team_away" , "score_home","score_away" ,"weather_temperature", "home_or_away", "over_under_line"]) # merge score_away & score_home into column 'score' final['score'] = np.where(final['team_away'] == 'New England Patriots', final['score_away'], final['score_home']) # Before showing our final dataset we will drop any rows with NA values. final = final.dropna() final.head() # In[11]: final['2_game_avg'] = final.score.rolling(window=2).mean() final['5_game_avg'] = final.score.rolling(window=5).mean() final.head() # In[12]: final = final.fillna(final.mean()) # In[13]: # This time we're checking for Outliers. Check each columns min & max to make sure the # is plausible final.describe() # In[14]: # no warning message and no exception is raised # pd.options.mode.chained_assignment = None # default='warn' # In[15]: df = final[['weather_temperature', 'over_under_line','home_or_away', '2_game_avg','5_game_avg', 'score']] # In[16]: df.info() # In[17]: # Need to convert three columns to float64 Dtype df['home_or_away'] = df['home_or_away'].astype('float64') df['over_under_line'] = df['over_under_line'].astype('float64') df['score'] = df['score'].astype('float64') df.info() # In[18]: plt.scatter(df['weather_temperature'], df['score'], color='red') plt.title('weather temperature Vs Score', fontsize=14) plt.xlabel('weather_temperature', fontsize=14) plt.ylabel('Score', fontsize=14) plt.grid(True) # In[19]: plt.scatter(df['over_under_line'], df['score'], color='red') plt.title('over_under_line Vs Score', fontsize=14) plt.xlabel('over_under_line', fontsize=14) plt.ylabel('Score', fontsize=14) plt.grid(True) # In[20]: plt.scatter(df['2_game_avg'], df['score'], color='red') plt.title('2 game average Vs Score', fontsize=14) plt.xlabel('2 game average', fontsize=14) plt.ylabel('Score', fontsize=14) plt.grid(True) # In[21]: plt.scatter(df['5_game_avg'], df['score'], color='red') plt.title('5 game average Vs Score', fontsize=14) plt.xlabel('5 game average', fontsize=14) plt.ylabel('Score', fontsize=14) plt.grid(True) # In[22]: sns.boxplot(x ="home_or_away", y = "score", data = df, palette="Set2") # In[23]: corr = df.corr() corr # In[24]: # More optional EDA pairplot(df) # In[25]: # More optional EDA fig= plot_corr(corr,xnames=corr.columns) # In[26]: X = pd.DataFrame(df, columns = ['2_game_avg', 'home_or_away']) y = pd.DataFrame(df, columns=['score']) # WITH a random_state parameter: # (Same split every time! Note you can change the random state to any integer.) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1) # Print the first element of each object. print(X_train.head(1)) print(X_test.head(1)) print(y_train.head(1)) print(y_test.head(1)) # In[27]: # Create linear regression model lin_reg_mod = LinearRegression() # Fit linear regression lin_reg_mod.fit(X_train, y_train) # Make prediction on the testing data pred = lin_reg_mod.predict(X_test) # In[28]: # Get the slope and intercept of the line best fit. print(lin_reg_mod.intercept_) print(lin_reg_mod.coef_) # In[29]: # Calculate the Root Mean Square Error between the actual & predicted test_set_rmse = (np.sqrt(mean_squared_error(y_test, pred))) # Calculate the R^2 or coefficent of determination between the actual & predicted test_set_r2 = r2_score(y_test, pred) # Note that for rmse, the lower that value is, the better the fit print(test_set_rmse) # The closer towards 1, the better the fit print(test_set_r2) # In[30]: df_results = y_test df_results['Predicted'] = pred.ravel() df_results['Residuals'] = abs(df_results['score']) - abs(df_results['Predicted']) print(df_results) # In[34]: # Residual plot using df_result fig = plt.figure(figsize=(10,7)) sns.residplot(x = "Predicted", y = "score",data = df_results, color='blue') # Title and labels. plt.title('Residuals', size=24) plt.xlabel('Predicted', size=18) plt.ylabel('Residual', size=18); # In[33]: # Plotting the actual vs predicted values sns.lmplot(x='score', y='Predicted', data=df_results, fit_reg=False) line_coords = np.arange(df_results.score.min().min(), df_results.Predicted.max().max()) plt.plot(line_coords, line_coords, # X and y points color='darkorange', linestyle='--') plt.xlabel('Actual Score', size=10) plt.title('Actual vs. Predicted') # In[30]: # Plotting the residuals distribution plt.subplots(figsize=(12, 6)) plt.title('Distribution of Residuals') sns.distplot(df_results['Residuals']) plt.show() # In[35]: df2 = df[['2_game_avg', 'home_or_away', 'score']] corr2 = df2.corr() # In[36]: fig= plot_corr(corr2,xnames=corr2.columns)The error is not clear to me. I am not that sophisticated a python programmer. I do not think this has to do with any
coding fundamentals.
This link gives a good explanation of why this error happens and how to fix. I just do not see how to apply it to my code.
https://stackoverflow.com/questions/6256...ng-sklearn
I cannot attach the NFL*.csv file, it is too big; even zipped it is too big.
I think the solution is in that link. I am not sure how to fix my code though.
Any help appreciated. Thanks in advance.
Respectfully,
ErnestTBass