![]() |
How to update trainSet on each iteration - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: How to update trainSet on each iteration (/thread-9775.html) |
How to update trainSet on each iteration - Raj - Apr-27-2018 Hi, I am using random forest method to predict some response variable. My train test is 70% of input data and test data is 30%. But for prediction of 30% data, I want to add to each ith row in to train set after each iteration of prediction as described below: for example My Input data 1 2 3 4 5 6 7 8 9 10 My train-set(initially): test set 1 2 3 4 5 6 7 My test set 8 9 10 after I predict for 8th row (first prediction, the trainset will update to 1~8, and when I predict 9th row, the trainset will update to 1~9, and so on My actual My code as below: # -*- coding: utf-8 -*- """ Created on Fri Apr 27 21:33:14 2018 @author: user """ import pandas as pd import numpy as np import matplotlib.pyplot as plt from scipy import stats dataFileName='RandomForestInput.xlsx' sheetName='Data' dataRaw=pd.read_excel(dataFileName,sheetname=sheetName) noData=len(dataRaw) import matplotlib.pylab as plt from sklearn.cross_validation import train_test_split from sklearn.cross_validation import cross_val_score from sklearn.preprocessing import StandardScaler import pandas as pd import numpy as np labels=['x1','x2','x3'] x=dataRaw[labels] y=dataRaw['y'] X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.1,random_state=0) sc=StandardScaler() sc.fit(X_train) x_std=sc.transform(x) X_train_std=sc.transform(X_train) X_test_std=sc.transform(X_test) from sklearn.svm import SVC from numpy import stack from sklearn.metrics import accuracy_score from sklearn.svm import SVR linear_svm=SVC(kernel='linear') linear_svm.fit(X_train_std,Y_train) y_pred=linear_svm.predict(X_test_std) coef=linear_svm.coef_[0] coef=np.absolute(coef) svm_indices=np.argsort(coef)[::-1] print('Linear SVM') print("Accuracy: %.2f" %accuracy_score(Y_test,y_pred)) for f in range(X_train.shape[1]): print(("%2d) %-*s %f" % (f+1,30,labels[svm_indices[f]],coef[svm_indices[f]]))) from sklearn.ensemble import RandomForestClassifier from numpy import stack from sklearn.metrics import accuracy_score forest=RandomForestClassifier(criterion='entropy',n_estimators=100,random_state=1,n_jobs=2) forest.fit=(X_train,Y_train) y_pred=forest.predict(X_test) importances=forest.feature_importances_ indices=np.argsort(importances)[::-1] print('RandonForest') print("Accuracy: %.2f" % accuracy_score(Y_test,y_pred)) for f in range(X_train.shape[1]): print(("%2d) %~*s %f" %(f+1,30,labels[indices[f]],importances[indices[f]])))%===== SVM method works, but Randomforest method give erro as below:
RE: How to update trainSet on each iteration - nilamo - Apr-27-2018 (Apr-27-2018, 03:25 PM)Raj Wrote: NotFittedError: Estimator not fitted, call Try calling fit() first :p(Apr-27-2018, 03:25 PM)Raj Wrote:You're not callingforest.fit=(X_train,Y_train) y_pred=forest.predict(X_test) fit , you're replacing the function with a tuple.
RE: How to update trainSet on each iteration - Raj - Apr-28-2018 Where to call this fit() in my code, RE: How to update trainSet on each iteration - nilamo - Apr-28-2018 Probably right where you're already almost calling it, in the code I already quoted. There's an = that shouldn't be there.
RE: How to update trainSet on each iteration - Raj - Apr-29-2018 I run the code successfully, runfile('D:/Mekala_Backupdata/PythonCodes/randonForest_SVR.py', wdir='D:/Mekala_Backupdata/PythonCodes') Linear SVM Accuracy: 0.25 1) x3 0.000000 2) x1 0.000000 3) x2 0.000000 RandonForest Accuracy: 0.25 1) x2 0.405015 2) x1 0.310160 3) x3 0.284826 But my following question: My initial Train set is 90% and test set is 10%, I want to update the train set on each iteration like: if my total data set is 10, initial trainset size is 7(1~7), I predict 8,9,10. When I predict 8th one, then my train set will become 8(1~8) to predict 9, then after predicting 9th one, the train set will update to 1~9, and predict 10th one RE: How to update trainSet on each iteration - nilamo - Apr-30-2018 Ok :) You're reading the input from an excel sheet, correct? So whatever the predicted values are, append that to the same excel sheet as a new row. RE: How to update trainSet on each iteration - Raj - May-01-2018 Yes, I am reading the data from xlxl sheet, but the sheet contains total data set, 1. Read data from the main input file 2. split the dataset into trainSet & testSet (I do not want to split randomly) 3. on each iteration(prediction), append the testSet data of ith row to the trainSet before next prediction |