May-23-2020, 12:26 PM
Hi guys,
I am trying to predict the target variable "Exchange Rate EURUSD" using a data set that I created by myself. I included a lot of economic indicators and the monthly EURUSD closing price from 01/01/2000 to 01/12/2020, which gives me 240 columns ( 20 years * 12 months ) and 34 features ( 33 indicators and 1 EURUSD exchange rate)
Here is one sample of the data set :
https://imgur.com/a/iJuGtXw
I kept the date as an index and also replaced all "," with "." in python. All in all the code looks like this :
First of all, I have to keep the time series nature in the data, thats why the data can'T be split randomly into a train and test set.
Moreover, I know you should not predict on training set, but on test set. But I get an error even with that code :
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
If i do that, I get the new error :
ValueError: Number of features of the model must match the input. Model n_features is 33 and input n_features is 1
I kind of understand what the problem is, but I have no idea how to fix it. I would appreciate any help a lot !
I am trying to predict the target variable "Exchange Rate EURUSD" using a data set that I created by myself. I included a lot of economic indicators and the monthly EURUSD closing price from 01/01/2000 to 01/12/2020, which gives me 240 columns ( 20 years * 12 months ) and 34 features ( 33 indicators and 1 EURUSD exchange rate)
Here is one sample of the data set :
https://imgur.com/a/iJuGtXw
I kept the date as an index and also replaced all "," with "." in python. All in all the code looks like this :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# Load Data Set df = pd.read_csv( "C:/merged.csv" ) df = df.set_index( 'Date' ) df[ "EURUSD Closing Price" ] = df[ "EURUSD Closing Price" ].replace( ',' , '.' , regex = True ).astype( float ) # Define variables X = df.drop([ "EURUSD Closing Price" ], axis = 1 ).values y = df[ "EURUSD Closing Price" ].values # Split data into 75 train / 25 test X_train, X_test, y_train, y_test = train_test_split(X, y) X_train = X[: int (X.shape[ 0 ] * 0.75 )] X_test = X[ int (X.shape[ 0 ] * 0.75 ):] y_train = y[: int (X.shape[ 0 ] * 0.75 )] y_test = y[ int (X.shape[ 0 ] * 0.75 ):] # RF Train and predict # RF rf = RandomForestRegressor(n_estimators = 1000 , random_state = 42 ) rf.fit(X_train, y_train) rf.predict(y_train) |
Moreover, I know you should not predict on training set, but on test set. But I get an error even with that code :
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
If i do that, I get the new error :
ValueError: Number of features of the model must match the input. Model n_features is 33 and input n_features is 1
I kind of understand what the problem is, but I have no idea how to fix it. I would appreciate any help a lot !