Python Forum
Text to integer with OneHotEncoder - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Text to integer with OneHotEncoder (/thread-11075.html)



Text to integer with OneHotEncoder - Scott - Jun-21-2018

Hi everyone I am trying to convert a variable from text to float or int to I can feed it to my model.

I first factorize it then use OneHotEncoder like below:

housing_cat = housing["ocean_proximity"]
housing_cat.head(10)
housing_cat_encoded, housing_categories = housing_cat.factorize()
housing_cat_encoded[:10]

from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
housing_cat_1hot = encoder.fit_transform(housing_cat_encoded.reshape(-1,1))
housing_cat_1hot
housing_cat_1hot.toarray()
I then try to feed it to a pipeline and feature union it together like below:

   
num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]

num_pipeline = Pipeline([
        ('selector', DataFrameSelector(num_attribs)),
        ('imputer', Imputer(strategy="median")),
        ('attribs_adder', CombinedAttributesAdder()),
        ('std_scaler', StandardScaler()),
    ])

cat_pipeline = Pipeline([
        ('selector', DataFrameSelector(cat_attribs)),
        ('cat_encoder', OneHotEncoder(sparse=False))])

from sklearn.pipeline import FeatureUnion

full_pipeline = FeatureUnion(transformer_list=[
        ("num_pipeline", num_pipeline),
        ("cat_pipeline", cat_pipeline),
    ])
                
housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared
However i get the below error:
array = np.array(array, dtype=dtype, order=order, copy=copy)

ValueError: could not convert string to float: 'NEAR BAY'

The first 10 entries of ocean_proximity look like this:
14196 NEAR OCEAN
8267 NEAR OCEAN
17445 NEAR OCEAN
14265 NEAR OCEAN
2271 INLAND
17848 <1H OCEAN
6252 <1H OCEAN
9389 NEAR BAY
6113 <1H OCEAN
6061 <1H OCEAN
Name: ocean_proximity, dtype: object

I could just drop the variable but I'd like to learn how to deal with text variables as well.

I appreciate any help.
Thanks


RE: Text to integer with OneHotEncoder - gontajones - Jun-21-2018

Where is this line in your code?
array = np.array(array, dtype=dtype, order=order, copy=copy)
Your code is trying to convert a string to float, but it is getting 'NEAR BAY' as input not a number like '14196'.


RE: Text to integer with OneHotEncoder - Scott - Jun-22-2018

Where would this code go? Also array is not defined. Is this just a general example?