Python Project - Parkinson's Detection

Python Project - Parkinson's Detection - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Python Project - Parkinson's Detection (/thread-21388.html)

Python Project - Parkinson's Detection - shivani - Sep-27-2019

Hi, I am trying to build a machine learning model for Parkinsons dataset. I am having trouble in extracting the features from the dataset. I need help in extracting the right features and labels.

import numpy as np
import pandas as pd
import os, sys
from sklearn.preprocessing import StandardScaler
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

#read parkinsons data
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data'
data=pd.read_csv(url)
print(data.head())

#extract features and labels
features=data.loc[:,data.columns!='status'].values
labels=data.loc[:,'status'].values

#Scale the features 
scaler=StandardScaler()
features=scaler.fit_transform(features)

print(features.shape)
#Splitting the dataset
x_train,x_test,y_train,y_test=train_test_split(features, labels, test_size=0.2, random_state=7)

model=XGBClassifier()
model.fit(x_train,y_train)

y_pred=model.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)

I am getting such error-

Error:Traceback (most recent call last):
  File "D:\practise\parkinsons detection\detect_parkinson.py", line 20, in <module>
    features=scaler.fit_transform(features)
  File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\base.py", line 553, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\preprocessing\data.py", line 639, in fit
    return self.partial_fit(X, y)
  File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\preprocessing\data.py", line 663, in partial_fit
    force_all_finite='allow-nan')
  File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\utils\validation.py", line 496, in check_array
    array = np.asarray(array, dtype=dtype, order=order)
  File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\core\numeric.py", line 538, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: 'phon_R01_S01_1'

RE: Python Project - Parkinson's Detection - jefsummers - Sep-27-2019

By the error message, features contains some non-numeric text which throws the exception at line 20. I suggest you print features in the line before that to see what it contains.

RE: Python Project - Parkinson's Detection - karansingh - Sep-28-2019

You are getting this error because your dataset contains a name, which is of string type.
In this case, the name is not a useful feature to make predictions. So, we need to exclude the first column from our features dataset.

Use this instead :

features=data.loc[:,data.columns!='status'].values[:,1:]

Which means we need all the rows starting from 0 to the end and column starting from 1st index to the end.

The accuracy of the model is 94.87 %
During my research, I found one of the python projects which is quite similar to this you must go through Python Project-Detecting Parkinson's Disease

Corrected Code :

import numpy as np
import pandas as pd
import os, sys
from sklearn.preprocessing import StandardScaler
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

#read parkinsons data
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data'
data=pd.read_csv(url)
print(data.head())

#extract features and labels
features=data.loc[:,data.columns!='status'].values[:,1:]
labels=data.loc[:,'status'].values

#Scale the features 
scaler=StandardScaler()
features=scaler.fit_transform(features)

print(features.shape)
#Splitting the dataset
x_train,x_test,y_train,y_test=train_test_split(features, labels, test_size=0.2, random_state=7)

model=XGBClassifier()
model.fit(x_train,y_train)

y_pred=model.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)