IndexError in Array while trying to do machine learning - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: IndexError in Array while trying to do machine learning (/thread-30895.html) |
IndexError in Array while trying to do machine learning - Mariaoye - Nov-12-2020 Hi All, I am trying to predict income(70000+) based on specific categorical fields (Sex and Highest Cert,dip,deg) based on python code below. I created a range for the average income and then specified the specific range of income(70000+) I wanted to predict using (Sex and Highest Cert,dip,deg) I have the following code. However, I get an error when I get to the One hot encoding part of the code. I am using python on visual studio. I have tried changing the categorical field to "Age", but it does not work. The code is below. # %% read dataframe from part1 import pandas as pd df = pd.read_pickle("data.pkl") #%% import numpy as np bins = [0, 30000, 50000, 70000, 100000, np.inf] names = ['<30000', '30000-50000', '50000-70000', '70000-100000', '100000+'] df['Avg Emp Income Range'] = pd.cut(df['Avg Emp Income'], bins, labels=names) #%% OHE of Avg empl income for val in df["Avg Emp Income Range"].unique(): df[f"Avg Emp Income Range_{val}"] = df["Avg Emp Income Range"] == val #%% selecting data x= ["Sex","Highest Cert,dip,deg"] #%% success=["Avg Emp Income Range_70000-100000","Avg Emp Income Range_100000+"] y=success # %% split into training / testing sets from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=123) #%% from sklearn.compose import ColumnTransformer from sklearn.impute import SimpleImputer from sklearn.preprocessing import OneHotEncoder import numpy as np from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score enc = OneHotEncoder(handle_unknown="ignore") ct = ColumnTransformer( [ ("ohe", enc, ["Sex","Highest Cert,dip,deg",]) ], remainder="passthrough", ) x_train = ct.fit_transform(x_train) x_test = ct.transform(x_test) Please what am I doing wrong?Thank you. |