Sep-27-2019, 12:13 PM
Hi, I am trying to build a machine learning model for Parkinsons dataset. I am having trouble in extracting the features from the dataset. I need help in extracting the right features and labels.
import numpy as np import pandas as pd import os, sys from sklearn.preprocessing import StandardScaler from xgboost import XGBClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score #read parkinsons data url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data' data=pd.read_csv(url) print(data.head()) #extract features and labels features=data.loc[:,data.columns!='status'].values labels=data.loc[:,'status'].values #Scale the features scaler=StandardScaler() features=scaler.fit_transform(features) print(features.shape) #Splitting the dataset x_train,x_test,y_train,y_test=train_test_split(features, labels, test_size=0.2, random_state=7) model=XGBClassifier() model.fit(x_train,y_train) y_pred=model.predict(x_test) print(accuracy_score(y_test, y_pred)*100)I am getting such error-
Error:Traceback (most recent call last):
File "D:\practise\parkinsons detection\detect_parkinson.py", line 20, in <module>
features=scaler.fit_transform(features)
File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\base.py", line 553, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\preprocessing\data.py", line 639, in fit
return self.partial_fit(X, y)
File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\preprocessing\data.py", line 663, in partial_fit
force_all_finite='allow-nan')
File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\utils\validation.py", line 496, in check_array
array = np.asarray(array, dtype=dtype, order=order)
File "C:\Users\Asus4\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\core\numeric.py", line 538, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: 'phon_R01_S01_1'