machine learning error (using jupyter)

machine learning error (using jupyter) - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: machine learning error (using jupyter) (/thread-19391.html)

machine learning error (using jupyter) - calonia - Jun-26-2019

i was trying to play around with machine learning and make prediction on this data set using jupyter:

https://github.com/dpkravi/DecisionTreeClassifier/blob/master/data.csv

but i get errors. i don't know if my coding is flawed, or the data set isn't eligible or valid for machine learning.
i am sorry for the inconvenience Blush

. i am a beginner in programing Big Grin

import pandas
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

isotope_data = pandas.read_csv('data.csv')
x = isotope_data.drop(columns=['pH'])
y = isotope_data['pH']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

model = DecisionTreeClassifier()
model.fit(x_train, y_train)
predictions = model.predict(x_test)
score = accuracy_score(y_test, predictions)
score

Error:ValueError                                Traceback (most recent call last)
<ipython-input-196-75a9ac0cfa74> in <module>
     10 
     11 model = DecisionTreeClassifier()
---> 12 model.fit(x_train, y_train)
     13 predictions = model.predict(x_test)
     14 score = accuracy_score(y_test, predictions)

~\Anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    799             sample_weight=sample_weight,
    800             check_input=check_input,
--> 801             X_idx_sorted=X_idx_sorted)
    802         return self
    803 

~\Anaconda3\lib\site-packages\sklearn\tree\tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    138 
    139         if is_classification:
--> 140             check_classification_targets(y)
    141             y = np.copy(y)
    142 

~\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
    169     if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
    170                       'multilabel-indicator', 'multilabel-sequences']:
--> 171         raise ValueError("Unknown label type: %r" % y_type)
    172 
    173 

ValueError: Unknown label type: 'continuous'

RE: machine learning error (using jupyter) - ThomasL - Jun-26-2019

The column 'pH' is 'continuous' which means it consists of real numbers
The target is supposed to be of a categorical class ['binary', 'multiclass', 'multiclass-multioutput', 'multilabel-indicator', 'multilabel-sequences']
for example like column 'quality' which is the label for this dataset.
column 'pH' is a feature.