Text to integer with OneHotEncoder - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Text to integer with OneHotEncoder (/thread-11075.html) |
Text to integer with OneHotEncoder - Scott - Jun-21-2018 Hi everyone I am trying to convert a variable from text to float or int to I can feed it to my model. I first factorize it then use OneHotEncoder like below: housing_cat = housing["ocean_proximity"] housing_cat.head(10) housing_cat_encoded, housing_categories = housing_cat.factorize() housing_cat_encoded[:10] from sklearn.preprocessing import OneHotEncoder encoder = OneHotEncoder() housing_cat_1hot = encoder.fit_transform(housing_cat_encoded.reshape(-1,1)) housing_cat_1hot housing_cat_1hot.toarray()I then try to feed it to a pipeline and feature union it together like below: num_attribs = list(housing_num) cat_attribs = ["ocean_proximity"] num_pipeline = Pipeline([ ('selector', DataFrameSelector(num_attribs)), ('imputer', Imputer(strategy="median")), ('attribs_adder', CombinedAttributesAdder()), ('std_scaler', StandardScaler()), ]) cat_pipeline = Pipeline([ ('selector', DataFrameSelector(cat_attribs)), ('cat_encoder', OneHotEncoder(sparse=False))]) from sklearn.pipeline import FeatureUnion full_pipeline = FeatureUnion(transformer_list=[ ("num_pipeline", num_pipeline), ("cat_pipeline", cat_pipeline), ]) housing_prepared = full_pipeline.fit_transform(housing) housing_preparedHowever i get the below error: array = np.array(array, dtype=dtype, order=order, copy=copy) ValueError: could not convert string to float: 'NEAR BAY' The first 10 entries of ocean_proximity look like this: 14196 NEAR OCEAN 8267 NEAR OCEAN 17445 NEAR OCEAN 14265 NEAR OCEAN 2271 INLAND 17848 <1H OCEAN 6252 <1H OCEAN 9389 NEAR BAY 6113 <1H OCEAN 6061 <1H OCEAN Name: ocean_proximity, dtype: object I could just drop the variable but I'd like to learn how to deal with text variables as well. I appreciate any help. Thanks RE: Text to integer with OneHotEncoder - gontajones - Jun-21-2018 Where is this line in your code? array = np.array(array, dtype=dtype, order=order, copy=copy)Your code is trying to convert a string to float, but it is getting 'NEAR BAY' as input not a number like '14196'. RE: Text to integer with OneHotEncoder - Scott - Jun-22-2018 Where would this code go? Also array is not defined. Is this just a general example? |