Jan-12-2018, 01:10 PM
(This post was last modified: Jan-12-2018, 01:10 PM by python_newbie09.)
Hi, I am trying to convert the car evaluation dataset from the UCI repository to implement a KNN algorithm on it and I need to first convert the categorical data into numerical values. I know how to convert one column but I am facing difficulty in converting multiple columns. My code snippet is as below (I am very new to Python so this may look very messy and the results I got from below is not what is expected as not all the columns were encoded correctly and I am not sure if I am doing it the right way)
import numpy as np import pandas as pd import matplotlib.pyplot as plt #importing the dataset attributes = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety'] target = ['acceptability'] dataset = pd.read_csv('car.data',names = attributes+target) X = dataset.iloc[:,:-1].values y = dataset.iloc[:,6].values #handling categorical data labelencoder_X = LabelEncoder() X[:,0]=labelencoder_X.fit_transform(X[:,0]) X[:,1]=labelencoder_X.fit_transform(X[:,1]) X[:,2]=labelencoder_X.fit_transform(X[:,2]) X[:,3]=labelencoder_X.fit_transform(X[:,3]) X[:,4]=labelencoder_X.fit_transform(X[:,4]) X[:,5]=labelencoder_X.fit_transform(X[:,5]) #perform dummy encoding to feature scale the data into a standardize format onehotencoder = OneHotEncoder(categorical_features=[0]) X = onehotencoder.fit_transform(X).toarray() onehotencoder = OneHotEncoder(categorical_features=[1]) X = onehotencoder.fit_transform(X).toarray() onehotencoder = OneHotEncoder(categorical_features=[2]) X = onehotencoder.fit_transform(X).toarray() onehotencoder = OneHotEncoder(categorical_features=[3]) X = onehotencoder.fit_transform(X).toarray() onehotencoder = OneHotEncoder(categorical_features=[4]) X = onehotencoder.fit_transform(X).toarray() onehotencoder = OneHotEncoder(categorical_features=[5]) X = onehotencoder.fit_transform(X).toarray()Any help on this will be much appreciated.