IndexError in Array while trying to do machine learning

Mariaoye · Nov-12-2020, 12:35 AM

Hi All,

I am trying to predict income(70000+) based on specific categorical fields (Sex and Highest Cert,dip,deg) based on python code below.

I created a range for the average income and then specified the specific range of income(70000+) I wanted to predict using
(Sex and Highest Cert,dip,deg)

I have the following code. However, I get an error when I get to the One hot encoding part of the code. I am using python on visual studio. I have tried changing the categorical field to "Age", but it does not work. The code is below.

# %% read dataframe from part1
import pandas as pd

df = pd.read_pickle("data.pkl")

#%%
import numpy as np
bins = [0, 30000, 50000, 70000, 100000, np.inf]
names = ['<30000', '30000-50000', '50000-70000', '70000-100000', '100000+']

df['Avg Emp Income Range'] = pd.cut(df['Avg Emp Income'], bins, labels=names)

#%% OHE of Avg empl income
for val in df["Avg Emp Income Range"].unique():
    df[f"Avg Emp Income Range_{val}"] = df["Avg Emp Income Range"] == val

#%% selecting data
x= ["Sex","Highest Cert,dip,deg"]

#%%
success=["Avg Emp Income Range_70000-100000","Avg Emp Income Range_100000+"]
y=success

# %% split into training / testing sets
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=123)

#%%
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
import numpy as np
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score

enc = OneHotEncoder(handle_unknown="ignore")
ct = ColumnTransformer(
    [
        ("ohe", enc, ["Sex","Highest Cert,dip,deg",])
    ],
    remainder="passthrough",
)

x_train = ct.fit_transform(x_train)
x_test = ct.transform(x_test)

Error: ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
c:\Users\maria\Documents\Project Capstone 2\Z NO\machine L.py in 
     42 )
     43 
---> 44 x_train = ct.fit_transform(x_train)
     45 x_test = ct.transform(x_test)

c:\Users\maria\Documents\Project Capstone 2\Z NO\venv\lib\site-packages\sklearn\compose\_column_transformer.py in fit_transform(self, X, y)
    522         else:
    523             self._feature_names_in = None
--> 524         X = _check_X(X)
    525         # set n_features_in_ attribute
    526         self._check_n_features(X, reset=True)

c:\Users\maria\Documents\Project Capstone 2\Z NO\venv\lib\site-packages\sklearn\compose\_column_transformer.py in _check_X(X)
    649     if hasattr(X, '__array__') or sparse.issparse(X):
    650         return X
--> 651     return check_array(X, force_all_finite='allow-nan', dtype=np.object)
    652 
    653 

c:\Users\maria\Documents\Project Capstone 2\Z NO\venv\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

c:\Users\maria\Documents\Project Capstone 2\Z NO\venv\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    621                     "Reshape your data either using array.reshape(-1, 1) if "
    622                     "your data has a single feature or array.reshape(1, -1) "
--> 623                     "if it contains a single sample.".format(array))
    624 
    625         # in the future np.flexible dtypes will be handled like object dtypes

ValueError: Expected 2D array, got 1D array instead:
array=['Sex'].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Please what am I doing wrong?

Thank you.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Feature Selection in Machine Learning	shiv11	4	1,880	Apr-09-2024, 02:22 PM Last Post: DataScience
	[machine learning] identifying a number 0-9 from a 28x28 picture, not working	SheeppOSU	0	1,856	Apr-09-2021, 12:38 AM Last Post: SheeppOSU
	Getting started in Machine Learning	Harshil	5	3,250	Dec-07-2020, 04:06 PM Last Post: sridhar
	Python Machine Learning: For Data Extraction	JaneTan	0	1,857	Nov-24-2020, 06:45 AM Last Post: JaneTan
	Errors with Machine Learning trading bot-- not sure why	MattKahn13	0	1,373	Aug-07-2020, 08:19 PM Last Post: MattKahn13
	How useful is PCA for machine learning?	Marvin93	0	1,541	Aug-07-2020, 02:07 PM Last Post: Marvin93
	How to extract data from paragraph using Machine Learning with python?	bccsthilina	2	3,060	Jul-27-2020, 07:02 AM Last Post: hussainmujtaba
	Machine Learning: Process	Enanda	13	4,322	Mar-18-2020, 02:02 AM Last Post: jefsummers
	Machine Learning Polynomial Regression	braveYug	0	1,721	Nov-13-2019, 11:41 AM Last Post: braveYug
	Ask for machine learning Python example with 2 data files	user5566b	2	2,295	Sep-05-2019, 12:15 PM Last Post: user5566b

IndexError in Array while trying to do machine learning

User Panel Messages

Announcements