![]() |
Python Project - Parkinson's Detection - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Python Project - Parkinson's Detection (/thread-21388.html) |
Python Project - Parkinson's Detection - shivani - Sep-27-2019 Hi, I am trying to build a machine learning model for Parkinsons dataset. I am having trouble in extracting the features from the dataset. I need help in extracting the right features and labels. import numpy as np import pandas as pd import os, sys from sklearn.preprocessing import StandardScaler from xgboost import XGBClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score #read parkinsons data url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data' data=pd.read_csv(url) print(data.head()) #extract features and labels features=data.loc[:,data.columns!='status'].values labels=data.loc[:,'status'].values #Scale the features scaler=StandardScaler() features=scaler.fit_transform(features) print(features.shape) #Splitting the dataset x_train,x_test,y_train,y_test=train_test_split(features, labels, test_size=0.2, random_state=7) model=XGBClassifier() model.fit(x_train,y_train) y_pred=model.predict(x_test) print(accuracy_score(y_test, y_pred)*100)I am getting such error-
RE: Python Project - Parkinson's Detection - jefsummers - Sep-27-2019 By the error message, features contains some non-numeric text which throws the exception at line 20. I suggest you print features in the line before that to see what it contains. RE: Python Project - Parkinson's Detection - karansingh - Sep-28-2019 You are getting this error because your dataset contains a name, which is of string type. In this case, the name is not a useful feature to make predictions. So, we need to exclude the first column from our features dataset. Use this instead : features=data.loc[:,data.columns!='status'].values[:,1:]Which means we need all the rows starting from 0 to the end and column starting from 1st index to the end. The accuracy of the model is 94.87 % During my research, I found one of the python projects which is quite similar to this you must go through Python Project-Detecting Parkinson's Disease Corrected Code : import numpy as np import pandas as pd import os, sys from sklearn.preprocessing import StandardScaler from xgboost import XGBClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score #read parkinsons data url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data' data=pd.read_csv(url) print(data.head()) #extract features and labels features=data.loc[:,data.columns!='status'].values[:,1:] labels=data.loc[:,'status'].values #Scale the features scaler=StandardScaler() features=scaler.fit_transform(features) print(features.shape) #Splitting the dataset x_train,x_test,y_train,y_test=train_test_split(features, labels, test_size=0.2, random_state=7) model=XGBClassifier() model.fit(x_train,y_train) y_pred=model.predict(x_test) print(accuracy_score(y_test, y_pred)*100) |