Python Forum
machine learning error (using jupyter) - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: machine learning error (using jupyter) (/thread-19391.html)



machine learning error (using jupyter) - calonia - Jun-26-2019

i was trying to play around with machine learning and make prediction on this data set using jupyter:

https://github.com/dpkravi/DecisionTreeClassifier/blob/master/data.csv

but i get errors. i don't know if my coding is flawed, or the data set isn't eligible or valid for machine learning.
i am sorry for the inconvenience Blush . i am a beginner in programing Big Grin .

import pandas
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

isotope_data = pandas.read_csv('data.csv')
x = isotope_data.drop(columns=['pH'])
y = isotope_data['pH']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

model = DecisionTreeClassifier()
model.fit(x_train, y_train)
predictions = model.predict(x_test)
score = accuracy_score(y_test, predictions)
score
Error:
ValueError Traceback (most recent call last) <ipython-input-196-75a9ac0cfa74> in <module> 10 11 model = DecisionTreeClassifier() ---> 12 model.fit(x_train, y_train) 13 predictions = model.predict(x_test) 14 score = accuracy_score(y_test, predictions) ~\Anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted) 799 sample_weight=sample_weight, 800 check_input=check_input, --> 801 X_idx_sorted=X_idx_sorted) 802 return self 803 ~\Anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted) 138 139 if is_classification: --> 140 check_classification_targets(y) 141 y = np.copy(y) 142 ~\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y) 169 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput', 170 'multilabel-indicator', 'multilabel-sequences']: --> 171 raise ValueError("Unknown label type: %r" % y_type) 172 173 ValueError: Unknown label type: 'continuous'



RE: machine learning error (using jupyter) - ThomasL - Jun-26-2019

The column 'pH' is 'continuous' which means it consists of real numbers
The target is supposed to be of a categorical class ['binary', 'multiclass', 'multiclass-multioutput', 'multilabel-indicator', 'multilabel-sequences']
for example like column 'quality' which is the label for this dataset.
column 'pH' is a feature.