Python Forum
Text to integer with OneHotEncoder
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Text to integer with OneHotEncoder
#1
Hi everyone I am trying to convert a variable from text to float or int to I can feed it to my model.

I first factorize it then use OneHotEncoder like below:

housing_cat = housing["ocean_proximity"]
housing_cat.head(10)
housing_cat_encoded, housing_categories = housing_cat.factorize()
housing_cat_encoded[:10]

from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
housing_cat_1hot = encoder.fit_transform(housing_cat_encoded.reshape(-1,1))
housing_cat_1hot
housing_cat_1hot.toarray()
I then try to feed it to a pipeline and feature union it together like below:

   
num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]

num_pipeline = Pipeline([
        ('selector', DataFrameSelector(num_attribs)),
        ('imputer', Imputer(strategy="median")),
        ('attribs_adder', CombinedAttributesAdder()),
        ('std_scaler', StandardScaler()),
    ])

cat_pipeline = Pipeline([
        ('selector', DataFrameSelector(cat_attribs)),
        ('cat_encoder', OneHotEncoder(sparse=False))])

from sklearn.pipeline import FeatureUnion

full_pipeline = FeatureUnion(transformer_list=[
        ("num_pipeline", num_pipeline),
        ("cat_pipeline", cat_pipeline),
    ])
                
housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared
However i get the below error:
array = np.array(array, dtype=dtype, order=order, copy=copy)

ValueError: could not convert string to float: 'NEAR BAY'

The first 10 entries of ocean_proximity look like this:
14196 NEAR OCEAN
8267 NEAR OCEAN
17445 NEAR OCEAN
14265 NEAR OCEAN
2271 INLAND
17848 <1H OCEAN
6252 <1H OCEAN
9389 NEAR BAY
6113 <1H OCEAN
6061 <1H OCEAN
Name: ocean_proximity, dtype: object

I could just drop the variable but I'd like to learn how to deal with text variables as well.

I appreciate any help.
Thanks
Reply
#2
Where is this line in your code?
array = np.array(array, dtype=dtype, order=order, copy=copy)
Your code is trying to convert a string to float, but it is getting 'NEAR BAY' as input not a number like '14196'.
Reply
#3
Where would this code go? Also array is not defined. Is this just a general example?
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020