Python Forum

Full Version: How to update trainSet on each iteration
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,
I am using random forest method to predict some response variable. My train test is 70% of input data and test data is 30%. But for prediction of 30% data, I want to add to each ith row in to train set after each iteration of prediction as described below:
for example
My Input data
1
2
3
4
5
6
7
8
9
10


My train-set(initially): test set
1
2
3
4
5
6
7



My test set
8
9
10

after I predict for 8th row (first prediction, the trainset will update to 1~8, and when I predict 9th row, the trainset will update to 1~9, and so on


My actual


My code as below:
# -*- coding: utf-8 -*-
"""
Created on Fri Apr 27 21:33:14 2018

@author: user
"""

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
dataFileName='RandomForestInput.xlsx'
sheetName='Data'
dataRaw=pd.read_excel(dataFileName,sheetname=sheetName)
noData=len(dataRaw)
import matplotlib.pylab as plt
from sklearn.cross_validation import train_test_split
from sklearn.cross_validation import cross_val_score
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np

labels=['x1','x2','x3']
x=dataRaw[labels]
y=dataRaw['y']

X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.1,random_state=0)

sc=StandardScaler()
sc.fit(X_train)
x_std=sc.transform(x)
X_train_std=sc.transform(X_train)
X_test_std=sc.transform(X_test)

from sklearn.svm import SVC
from numpy import stack
from sklearn.metrics import accuracy_score
from sklearn.svm import SVR

linear_svm=SVC(kernel='linear')
linear_svm.fit(X_train_std,Y_train)
y_pred=linear_svm.predict(X_test_std)
coef=linear_svm.coef_[0]
coef=np.absolute(coef)
svm_indices=np.argsort(coef)[::-1]
print('Linear SVM')
print("Accuracy: %.2f" %accuracy_score(Y_test,y_pred))
for f in range(X_train.shape[1]):
    print(("%2d) %-*s %f" % (f+1,30,labels[svm_indices[f]],coef[svm_indices[f]])))

from sklearn.ensemble import RandomForestClassifier
from numpy import stack
from sklearn.metrics import accuracy_score
forest=RandomForestClassifier(criterion='entropy',n_estimators=100,random_state=1,n_jobs=2)
forest.fit=(X_train,Y_train)
y_pred=forest.predict(X_test)
importances=forest.feature_importances_
indices=np.argsort(importances)[::-1]
print('RandonForest')
print("Accuracy: %.2f" % accuracy_score(Y_test,y_pred))
for f in range(X_train.shape[1]):
    print(("%2d) %~*s %f" %(f+1,30,labels[indices[f]],importances[indices[f]])))
%=====
SVM method works, but Randomforest method give erro as below:
Output:
Linear SVM Accuracy: 0.25 1) x3 0.000000 2) x1 0.000000 3) x2 0.000000 runfile('D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py', wdir='D:/Mekala_Backupdata/PythonCodes') Linear SVM Accuracy: 0.25 1) x3 0.000000 2) x1 0.000000 3) x2 0.000000
Error:
Traceback (most recent call last): File "<ipython-input-20-b9629da5b974>", line 1, in <module> runfile('D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py', wdir='D:/Mekala_Backupdata/PythonCodes') File "C:\Users\user\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile execfile(filename, namespace) File "C:\Users\user\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py", line 56, in <module> y_pred=forest.predict(X_test) File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py", line 534, in predict proba = self.predict_proba(X) File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py", line 573, in predict_proba X = self._validate_X_predict(X) File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py", line 352, in _validate_X_predict raise NotFittedError("Estimator not fitted, " NotFittedError: Estimator not fitted, call `fit` before exploiting the model.
(Apr-27-2018, 03:25 PM)Raj Wrote: [ -> ]NotFittedError: Estimator not fitted, call fit before exploiting the model.
Try calling fit() first :p

(Apr-27-2018, 03:25 PM)Raj Wrote: [ -> ]
forest.fit=(X_train,Y_train)
y_pred=forest.predict(X_test)
You're not calling fit, you're replacing the function with a tuple.
Where to call this fit() in my code,
Probably right where you're already almost calling it, in the code I already quoted. There's an = that shouldn't be there.
I run the code successfully,
runfile('D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py', wdir='D:/Mekala_Backupdata/PythonCodes')
Linear SVM
Accuracy: 0.25
1) x3 0.000000
2) x1 0.000000
3) x2 0.000000
RandonForest
Accuracy: 0.25
1) x2 0.405015
2) x1 0.310160
3) x3 0.284826

But my following question:

My initial Train set is 90% and test set is 10%, I want to update the train set on each iteration like:
if my total data set is 10, initial trainset size is 7(1~7), I predict 8,9,10. When I predict 8th one, then my train set will become 8(1~8) to predict 9, then after predicting 9th one, the train set will update to 1~9, and predict 10th one
Ok :)
You're reading the input from an excel sheet, correct? So whatever the predicted values are, append that to the same excel sheet as a new row.
Yes, I am reading the data from xlxl sheet, but the sheet contains total data set,

1. Read data from the main input file
2. split the dataset into trainSet & testSet (I do not want to split randomly)
3. on each iteration(prediction), append the testSet data of ith row to the trainSet before next prediction