Hi,
I am using random forest method to predict some response variable. My train test is 70% of input data and test data is 30%. But for prediction of 30% data, I want to add to each ith row in to train set after each iteration of prediction as described below:
for example
My Input data
1
2
3
4
5
6
7
8
9
10
My train-set(initially): test set
1
2
3
4
5
6
7
My test set
8
9
10
after I predict for 8th row (first prediction, the trainset will update to 1~8, and when I predict 9th row, the trainset will update to 1~9, and so on
My actual
My code as below:
SVM method works, but Randomforest method give erro as below:
I am using random forest method to predict some response variable. My train test is 70% of input data and test data is 30%. But for prediction of 30% data, I want to add to each ith row in to train set after each iteration of prediction as described below:
for example
My Input data
1
2
3
4
5
6
7
8
9
10
My train-set(initially): test set
1
2
3
4
5
6
7
My test set
8
9
10
after I predict for 8th row (first prediction, the trainset will update to 1~8, and when I predict 9th row, the trainset will update to 1~9, and so on
My actual
My code as below:
# -*- coding: utf-8 -*- """ Created on Fri Apr 27 21:33:14 2018 @author: user """ import pandas as pd import numpy as np import matplotlib.pyplot as plt from scipy import stats dataFileName='RandomForestInput.xlsx' sheetName='Data' dataRaw=pd.read_excel(dataFileName,sheetname=sheetName) noData=len(dataRaw) import matplotlib.pylab as plt from sklearn.cross_validation import train_test_split from sklearn.cross_validation import cross_val_score from sklearn.preprocessing import StandardScaler import pandas as pd import numpy as np labels=['x1','x2','x3'] x=dataRaw[labels] y=dataRaw['y'] X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.1,random_state=0) sc=StandardScaler() sc.fit(X_train) x_std=sc.transform(x) X_train_std=sc.transform(X_train) X_test_std=sc.transform(X_test) from sklearn.svm import SVC from numpy import stack from sklearn.metrics import accuracy_score from sklearn.svm import SVR linear_svm=SVC(kernel='linear') linear_svm.fit(X_train_std,Y_train) y_pred=linear_svm.predict(X_test_std) coef=linear_svm.coef_[0] coef=np.absolute(coef) svm_indices=np.argsort(coef)[::-1] print('Linear SVM') print("Accuracy: %.2f" %accuracy_score(Y_test,y_pred)) for f in range(X_train.shape[1]): print(("%2d) %-*s %f" % (f+1,30,labels[svm_indices[f]],coef[svm_indices[f]]))) from sklearn.ensemble import RandomForestClassifier from numpy import stack from sklearn.metrics import accuracy_score forest=RandomForestClassifier(criterion='entropy',n_estimators=100,random_state=1,n_jobs=2) forest.fit=(X_train,Y_train) y_pred=forest.predict(X_test) importances=forest.feature_importances_ indices=np.argsort(importances)[::-1] print('RandonForest') print("Accuracy: %.2f" % accuracy_score(Y_test,y_pred)) for f in range(X_train.shape[1]): print(("%2d) %~*s %f" %(f+1,30,labels[indices[f]],importances[indices[f]])))%=====
SVM method works, but Randomforest method give erro as below:
Output:Linear SVM
Accuracy: 0.25
1) x3 0.000000
2) x1 0.000000
3) x2 0.000000
runfile('D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py', wdir='D:/Mekala_Backupdata/PythonCodes')
Linear SVM
Accuracy: 0.25
1) x3 0.000000
2) x1 0.000000
3) x2 0.000000
Error:Traceback (most recent call last):
File "<ipython-input-20-b9629da5b974>", line 1, in <module>
runfile('D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py', wdir='D:/Mekala_Backupdata/PythonCodes')
File "C:\Users\user\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\Users\user\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py", line 56, in <module>
y_pred=forest.predict(X_test)
File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py", line 534, in predict
proba = self.predict_proba(X)
File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py", line 573, in predict_proba
X = self._validate_X_predict(X)
File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py", line 352, in _validate_X_predict
raise NotFittedError("Estimator not fitted, "
NotFittedError: Estimator not fitted, call `fit` before exploiting the model.
Attached Files
RandomForestInput.xlsx (Size: 9.07 KB / Downloads: 24)