ValueError: could not convert string to float: Close??

ValueError: could not convert string to float: Close?? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: ValueError: could not convert string to float: Close?? (/thread-5900.html)

ValueError: could not convert string to float: Close?? - BlackHeart - Oct-27-2017

Honestly, I don't even understand what the issue is here... Could you guys help me out please?

It may be referring to one of my columns in my dataset.csv file named 'Close'

Error message:

File "/home/b/pycharm-community-2017.2.3/helpers/pydev/pydevd.py", line 1599, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/home/b/pycharm-community-2017.2.3/helpers/pydev/pydevd.py", line 1026, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/b/PycharmProjects/ANN1a/ANN2-Keras1a", line 41, in <module>
    results = cross_val_score(pipeline, X, Y, cv=kfold)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/model_selection/_validation.py", line 342, in cross_val_score
    pre_dispatch=pre_dispatch)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/model_selection/_validation.py", line 206, in cross_validate
    for train, test in cv.split(X, y, groups))
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 779, in __call__
    while self.dispatch_one_batch(iterator):
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 625, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 588, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/_parallel_backends.py", line 111, in apply_async
    result = ImmediateResult(func)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/_parallel_backends.py", line 332, in __init__
    self.results = batch()
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/lib/python2.7/dist-packages/sklearn/model_selection/_validation.py", line 488, in _fit_and_score
    test_scores = _score(estimator, X_test, y_test, scorer, is_multimetric)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/model_selection/_validation.py", line 523, in _score
    return _multimetric_score(estimator, X_test, y_test, scorer)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/model_selection/_validation.py", line 553, in _multimetric_score
    score = scorer(estimator, X_test, y_test)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/scorer.py", line 244, in _passthrough_scorer
    return estimator.score(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/metaestimators.py", line 115, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/pipeline.py", line 486, in score
    Xt = transform.transform(Xt)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/data.py", line 681, in transform
    estimator=self, dtype=FLOAT_DTYPES)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 433, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: Close

Here is my code in its entirety:

import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline


# load dataset
dataframe = pandas.read_csv("PTNprice.csv", delim_whitespace=True, header=None, usecols=[1,2,3,4])
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:4]
Y = dataset[:,1]

# define the model
def larger_model():
	# create model
	model = Sequential()
	model.add(Dense(100, input_dim=4, kernel_initializer='normal', activation='relu'))
	model.add(Dense(50, kernel_initializer='normal', activation='relu'))
	model.add(Dense(1, kernel_initializer='normal'))
	# Compile model
	model.compile(loss='mean_squared_error', optimizer='adam')
	return model

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# evaluate model with standardized dataset
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=larger_model, epochs=50, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Standardized: %.2f (%.2f) MSE" % (results.mean(), results.std()))

RE: ValueError: could not convert string to float: Close?? - Larz60+ - Oct-27-2017

what is the name of your program?
The error was generated in validation.py, line 433, but what caused it will be the last line mentioned with your program name on it.

RE: ValueError: could not convert string to float: Close?? - BlackHeart - Oct-27-2017

(Oct-27-2017, 04:45 PM)Larz60+ Wrote: what is the name of your program?
The error was generated in validation.py, line 433, but what caused it will be the last line mentioned with your program name on it.

Well I've been making a new file and renaming it each time I make a change the code, so that if something doesn't work I can always regress back to where I was. I keep changing the name ann1a,ann1b,ann1c,ann1-keras1a, etc, etc. I have it all stored in the same project folder.

I actually think I may have gotten it to work last night! I changed header from header=none to header=1 and it seemed to realize that 'Close' was a part of the column headers.

before:

# load dataset
dataframe = pandas.read_csv("PTNprice.csv", delim_whitespace=True, header=none, usecols=[1,2,3,4])
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:4]
Y = dataset[:,1]

after:

# load dataset
dataframe = pandas.read_csv("PTNprice.csv", delim_whitespace=True, header=1, usecols=[1,2,3,4])
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:4]
Y = dataset[:,1]

I'm starting to reach my limits with this code though, since I've only been coding with python for a few days now. I don't think I'm getting the output to come out correctly, and I don't understand it enough to fix it. I think its returning a 0.00% accuracy and I don't understand its predictions. I'm trying to get it to crunch 4 columns Open,High,Low,Close and then predict the next Close number.

output:

Larger: 0.00 (0.00) MSE
[ 0.78021598  0.79241288  0.81000006 ...,  3.64232779  3.59621549
  3.79605269]

Here's my code in its entirety:

import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline


# load dataset
dataframe = pandas.read_csv("PTNprice.csv", delim_whitespace=True, header=1, usecols=[1,2,3,4])
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:4]
Y = dataset[:,1]

# define the model
def larger_model():
	# create model
	model = Sequential()
	model.add(Dense(100, input_dim=4, kernel_initializer='normal', activation='relu'))
	model.add(Dense(50, kernel_initializer='normal', activation='relu'))
	model.add(Dense(1, kernel_initializer='normal'))
	# Compile model
	model.compile(loss='mean_squared_error', optimizer='sgd')
	return model

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=larger_model, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Larger: %.2f (%.2f) MSE" % (results.mean(), results.std()))

pipeline.fit(X, Y)
prediction = pipeline.predict(X)
print prediction

quick edit:

Maybe the prediction is suppose to be for Y instead of X right because Y is the output layer?

pipeline.fit(X, Y)
prediction = pipeline.predict(X)
print prediction

pipeline.fit(X, Y)
prediction = pipeline.predict(Y)
print prediction

RE: ValueError: could not convert string to float: Close?? - Larz60+ - Oct-27-2017

a better way to keep backups is to keep the same program name.

Put all source into a directory named src
Create another directory at same node named backup.
Before makng major changes, create a new directory in the backup with a name similar to src_backup_MMDDYY_time
Copy full src directory into newly created backup directory

This way you can go back as far as you need to to restore to a point.

It will make life a lot easire in the long run.