Jun-21-2018, 07:01 AM
Hi everyone I am trying to convert a variable from text to float or int to I can feed it to my model.
I first factorize it then use OneHotEncoder like below:
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: 'NEAR BAY'
The first 10 entries of ocean_proximity look like this:
14196 NEAR OCEAN
8267 NEAR OCEAN
17445 NEAR OCEAN
14265 NEAR OCEAN
2271 INLAND
17848 <1H OCEAN
6252 <1H OCEAN
9389 NEAR BAY
6113 <1H OCEAN
6061 <1H OCEAN
Name: ocean_proximity, dtype: object
I could just drop the variable but I'd like to learn how to deal with text variables as well.
I appreciate any help.
Thanks
I first factorize it then use OneHotEncoder like below:
housing_cat = housing["ocean_proximity"] housing_cat.head(10) housing_cat_encoded, housing_categories = housing_cat.factorize() housing_cat_encoded[:10] from sklearn.preprocessing import OneHotEncoder encoder = OneHotEncoder() housing_cat_1hot = encoder.fit_transform(housing_cat_encoded.reshape(-1,1)) housing_cat_1hot housing_cat_1hot.toarray()I then try to feed it to a pipeline and feature union it together like below:
num_attribs = list(housing_num) cat_attribs = ["ocean_proximity"] num_pipeline = Pipeline([ ('selector', DataFrameSelector(num_attribs)), ('imputer', Imputer(strategy="median")), ('attribs_adder', CombinedAttributesAdder()), ('std_scaler', StandardScaler()), ]) cat_pipeline = Pipeline([ ('selector', DataFrameSelector(cat_attribs)), ('cat_encoder', OneHotEncoder(sparse=False))]) from sklearn.pipeline import FeatureUnion full_pipeline = FeatureUnion(transformer_list=[ ("num_pipeline", num_pipeline), ("cat_pipeline", cat_pipeline), ]) housing_prepared = full_pipeline.fit_transform(housing) housing_preparedHowever i get the below error:
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: 'NEAR BAY'
The first 10 entries of ocean_proximity look like this:
14196 NEAR OCEAN
8267 NEAR OCEAN
17445 NEAR OCEAN
14265 NEAR OCEAN
2271 INLAND
17848 <1H OCEAN
6252 <1H OCEAN
9389 NEAR BAY
6113 <1H OCEAN
6061 <1H OCEAN
Name: ocean_proximity, dtype: object
I could just drop the variable but I'd like to learn how to deal with text variables as well.
I appreciate any help.
Thanks