Python Forum

Hi,
I am using random forest method to predict some response variable. My train test is 70% of input data and test data is 30%. But for prediction of 30% data, I want to add to each ith row in to train set after each iteration of prediction as described below:
for example
My Input data
1
2
3
4
5
6
7
8
9
10

My train-set(initially): test set
1
2
3
4
5
6
7

My test set
8
9
10

after I predict for 8th row (first prediction, the trainset will update to 1~8, and when I predict 9th row, the trainset will update to 1~9, and so on

My actual

My code as below:

# -*- coding: utf-8 -*-
"""
Created on Fri Apr 27 21:33:14 2018

@author: user
"""

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
dataFileName='RandomForestInput.xlsx'
sheetName='Data'
dataRaw=pd.read_excel(dataFileName,sheetname=sheetName)
noData=len(dataRaw)
import matplotlib.pylab as plt
from sklearn.cross_validation import train_test_split
from sklearn.cross_validation import cross_val_score
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np

labels=['x1','x2','x3']
x=dataRaw[labels]
y=dataRaw['y']

X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.1,random_state=0)

sc=StandardScaler()
sc.fit(X_train)
x_std=sc.transform(x)
X_train_std=sc.transform(X_train)
X_test_std=sc.transform(X_test)

from sklearn.svm import SVC
from numpy import stack
from sklearn.metrics import accuracy_score
from sklearn.svm import SVR

linear_svm=SVC(kernel='linear')
linear_svm.fit(X_train_std,Y_train)
y_pred=linear_svm.predict(X_test_std)
coef=linear_svm.coef_[0]
coef=np.absolute(coef)
svm_indices=np.argsort(coef)[::-1]
print('Linear SVM')
print("Accuracy: %.2f" %accuracy_score(Y_test,y_pred))
for f in range(X_train.shape[1]):
    print(("%2d) %-*s %f" % (f+1,30,labels[svm_indices[f]],coef[svm_indices[f]])))

from sklearn.ensemble import RandomForestClassifier
from numpy import stack
from sklearn.metrics import accuracy_score
forest=RandomForestClassifier(criterion='entropy',n_estimators=100,random_state=1,n_jobs=2)
forest.fit=(X_train,Y_train)
y_pred=forest.predict(X_test)
importances=forest.feature_importances_
indices=np.argsort(importances)[::-1]
print('RandonForest')
print("Accuracy: %.2f" % accuracy_score(Y_test,y_pred))
for f in range(X_train.shape[1]):
    print(("%2d) %~*s %f" %(f+1,30,labels[indices[f]],importances[indices[f]])))

%=====
SVM method works, but Randomforest method give erro as below:

Output:Linear SVM
Accuracy: 0.25
 1) x3                             0.000000
 2) x1                             0.000000
 3) x2                             0.000000

runfile('D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py', wdir='D:/Mekala_Backupdata/PythonCodes')
Linear SVM
Accuracy: 0.25
 1) x3                             0.000000
 2) x1                             0.000000
 3) x2                             0.000000

Error:Traceback (most recent call last):

  File "<ipython-input-20-b9629da5b974>", line 1, in <module>
    runfile('D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py', wdir='D:/Mekala_Backupdata/PythonCodes')

  File "C:\Users\user\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
    execfile(filename, namespace)

  File "C:\Users\user\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py", line 56, in <module>
    y_pred=forest.predict(X_test)

  File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py", line 534, in predict
    proba = self.predict_proba(X)

  File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py", line 573, in predict_proba
    X = self._validate_X_predict(X)

  File "C:\Users\user\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py", line 352, in _validate_X_predict
    raise NotFittedError("Estimator not fitted, "

NotFittedError: Estimator not fitted, call `fit` before exploiting the model.

(Apr-27-2018, 03:25 PM)Raj Wrote: [ -> ]NotFittedError: Estimator not fitted, call fit before exploiting the model.

Try calling fit() first :p

(Apr-27-2018, 03:25 PM)Raj Wrote: [ -> ]
forest.fit=(X_train,Y_train)
y_pred=forest.predict(X_test)

You're not calling fit, you're replacing the function with a tuple.

Where to call this fit() in my code,

Probably right where you're already almost calling it, in the code I already quoted. There's an = that shouldn't be there.

I run the code successfully,
runfile('D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py', wdir='D:/Mekala_Backupdata/PythonCodes')
Linear SVM
Accuracy: 0.25
1) x3 0.000000
2) x1 0.000000
3) x2 0.000000
RandonForest
Accuracy: 0.25
1) x2 0.405015
2) x1 0.310160
3) x3 0.284826

But my following question:

My initial Train set is 90% and test set is 10%, I want to update the train set on each iteration like:
if my total data set is 10, initial trainset size is 7(1~7), I predict 8,9,10. When I predict 8th one, then my train set will become 8(1~8) to predict 9, then after predicting 9th one, the train set will update to 1~9, and predict 10th one

Ok :)
You're reading the input from an excel sheet, correct? So whatever the predicted values are, append that to the same excel sheet as a new row.

Yes, I am reading the data from xlxl sheet, but the sheet contains total data set,

1. Read data from the main input file
2. split the dataset into trainSet & testSet (I do not want to split randomly)
3. on each iteration(prediction), append the testSet data of ith row to the trainSet before next prediction

Raj

nilamo

Raj

nilamo

Raj

nilamo

Raj