Python Forum
How to update trainSet on each iteration
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to update trainSet on each iteration
#1
Hi,
I am using random forest method to predict some response variable. My train test is 70% of input data and test data is 30%. But for prediction of 30% data, I want to add to each ith row in to train set after each iteration of prediction as described below:
for example
My Input data
1
2
3
4
5
6
7
8
9
10


My train-set(initially): test set
1
2
3
4
5
6
7



My test set
8
9
10

after I predict for 8th row (first prediction, the trainset will update to 1~8, and when I predict 9th row, the trainset will update to 1~9, and so on


My actual


My code as below:
# -*- coding: utf-8 -*-
"""
Created on Fri Apr 27 21:33:14 2018

@author: user
"""

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
dataFileName='RandomForestInput.xlsx'
sheetName='Data'
dataRaw=pd.read_excel(dataFileName,sheetname=sheetName)
noData=len(dataRaw)
import matplotlib.pylab as plt
from sklearn.cross_validation import train_test_split
from sklearn.cross_validation import cross_val_score
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np

labels=['x1','x2','x3']
x=dataRaw[labels]
y=dataRaw['y']

X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.1,random_state=0)

sc=StandardScaler()
sc.fit(X_train)
x_std=sc.transform(x)
X_train_std=sc.transform(X_train)
X_test_std=sc.transform(X_test)

from sklearn.svm import SVC
from numpy import stack
from sklearn.metrics import accuracy_score
from sklearn.svm import SVR

linear_svm=SVC(kernel='linear')
linear_svm.fit(X_train_std,Y_train)
y_pred=linear_svm.predict(X_test_std)
coef=linear_svm.coef_[0]
coef=np.absolute(coef)
svm_indices=np.argsort(coef)[::-1]
print('Linear SVM')
print("Accuracy: %.2f" %accuracy_score(Y_test,y_pred))
for f in range(X_train.shape[1]):
    print(("%2d) %-*s %f" % (f+1,30,labels[svm_indices[f]],coef[svm_indices[f]])))

from sklearn.ensemble import RandomForestClassifier
from numpy import stack
from sklearn.metrics import accuracy_score
forest=RandomForestClassifier(criterion='entropy',n_estimators=100,random_state=1,n_jobs=2)
forest.fit=(X_train,Y_train)
y_pred=forest.predict(X_test)
importances=forest.feature_importances_
indices=np.argsort(importances)[::-1]
print('RandonForest')
print("Accuracy: %.2f" % accuracy_score(Y_test,y_pred))
for f in range(X_train.shape[1]):
    print(("%2d) %~*s %f" %(f+1,30,labels[indices[f]],importances[indices[f]])))
%=====
SVM method works, but Randomforest method give erro as below:
Output:
Linear SVM Accuracy: 0.25 1) x3 0.000000 2) x1 0.000000 3) x2 0.000000 runfile('D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py', wdir='D:/Mekala_Backupdata/PythonCodes') Linear SVM Accuracy: 0.25 1) x3 0.000000 2) x1 0.000000 3) x2 0.000000
Error:
Traceback (most recent call last): File "<ipython-input-20-b9629da5b974>", line 1, in <module> runfile('D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py', wdir='D:/Mekala_Backupdata/PythonCodes') File "C:\Users\user\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile execfile(filename, namespace) File "C:\Users\user\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py", line 56, in <module> y_pred=forest.predict(X_test) File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py", line 534, in predict proba = self.predict_proba(X) File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py", line 573, in predict_proba X = self._validate_X_predict(X) File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py", line 352, in _validate_X_predict raise NotFittedError("Estimator not fitted, " NotFittedError: Estimator not fitted, call `fit` before exploiting the model.

Attached Files

.xlsx   RandomForestInput.xlsx (Size: 9.07 KB / Downloads: 24)
Reply
#2
(Apr-27-2018, 03:25 PM)Raj Wrote: NotFittedError: Estimator not fitted, call fit before exploiting the model.
Try calling fit() first :p

(Apr-27-2018, 03:25 PM)Raj Wrote:
forest.fit=(X_train,Y_train)
y_pred=forest.predict(X_test)
You're not calling fit, you're replacing the function with a tuple.
Reply
#3
Where to call this fit() in my code,
Reply
#4
Probably right where you're already almost calling it, in the code I already quoted. There's an = that shouldn't be there.
Reply
#5
I run the code successfully,
runfile('D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py', wdir='D:/Mekala_Backupdata/PythonCodes')
Linear SVM
Accuracy: 0.25
1) x3 0.000000
2) x1 0.000000
3) x2 0.000000
RandonForest
Accuracy: 0.25
1) x2 0.405015
2) x1 0.310160
3) x3 0.284826

But my following question:

My initial Train set is 90% and test set is 10%, I want to update the train set on each iteration like:
if my total data set is 10, initial trainset size is 7(1~7), I predict 8,9,10. When I predict 8th one, then my train set will become 8(1~8) to predict 9, then after predicting 9th one, the train set will update to 1~9, and predict 10th one
Reply
#6
Ok :)
You're reading the input from an excel sheet, correct? So whatever the predicted values are, append that to the same excel sheet as a new row.
Reply
#7
Yes, I am reading the data from xlxl sheet, but the sheet contains total data set,

1. Read data from the main input file
2. split the dataset into trainSet & testSet (I do not want to split randomly)
3. on each iteration(prediction), append the testSet data of ith row to the trainSet before next prediction
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to split data into trainSet and testSet retaining the index continuous Raj 1 2,464 May-02-2018, 09:03 AM
Last Post: ThiefOfTime

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020