Python Forum
Python Project - Parkinson's Detection - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Python Project - Parkinson's Detection (/thread-21388.html)



Python Project - Parkinson's Detection - shivani - Sep-27-2019

Hi, I am trying to build a machine learning model for Parkinsons dataset. I am having trouble in extracting the features from the dataset. I need help in extracting the right features and labels.

import numpy as np
import pandas as pd
import os, sys
from sklearn.preprocessing import StandardScaler
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

#read parkinsons data
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data'
data=pd.read_csv(url)
print(data.head())

#extract features and labels
features=data.loc[:,data.columns!='status'].values
labels=data.loc[:,'status'].values

#Scale the features 
scaler=StandardScaler()
features=scaler.fit_transform(features)

print(features.shape)
#Splitting the dataset
x_train,x_test,y_train,y_test=train_test_split(features, labels, test_size=0.2, random_state=7)

model=XGBClassifier()
model.fit(x_train,y_train)

y_pred=model.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)
I am getting such error-

Error:
Traceback (most recent call last): File "D:\practise\parkinsons detection\detect_parkinson.py", line 20, in <module> features=scaler.fit_transform(features) File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\base.py", line 553, in fit_transform return self.fit(X, **fit_params).transform(X) File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\preprocessing\data.py", line 639, in fit return self.partial_fit(X, y) File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\preprocessing\data.py", line 663, in partial_fit force_all_finite='allow-nan') File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\utils\validation.py", line 496, in check_array array = np.asarray(array, dtype=dtype, order=order) File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\core\numeric.py", line 538, in asarray return array(a, dtype, copy=False, order=order) ValueError: could not convert string to float: 'phon_R01_S01_1'



RE: Python Project - Parkinson's Detection - jefsummers - Sep-27-2019

By the error message, features contains some non-numeric text which throws the exception at line 20. I suggest you print features in the line before that to see what it contains.


RE: Python Project - Parkinson's Detection - karansingh - Sep-28-2019

You are getting this error because your dataset contains a name, which is of string type.
In this case, the name is not a useful feature to make predictions. So, we need to exclude the first column from our features dataset.

Use this instead :
features=data.loc[:,data.columns!='status'].values[:,1:]
Which means we need all the rows starting from 0 to the end and column starting from 1st index to the end.

The accuracy of the model is 94.87 %
During my research, I found one of the python projects which is quite similar to this you must go through Python Project-Detecting Parkinson's Disease

Corrected Code :
import numpy as np
import pandas as pd
import os, sys
from sklearn.preprocessing import StandardScaler
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

#read parkinsons data
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data'
data=pd.read_csv(url)
print(data.head())

#extract features and labels
features=data.loc[:,data.columns!='status'].values[:,1:]
labels=data.loc[:,'status'].values

#Scale the features 
scaler=StandardScaler()
features=scaler.fit_transform(features)

print(features.shape)
#Splitting the dataset
x_train,x_test,y_train,y_test=train_test_split(features, labels, test_size=0.2, random_state=7)

model=XGBClassifier()
model.fit(x_train,y_train)

y_pred=model.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)