Python LogisticRegression

albertjblack · (This post was last modified: May-23-2021, 09:41 AM by Larz60+.)

Hello can anyone help
I do not understand what is happening here...
I do not understand what is happening here... PYTHON CODE

    for i, j in enumerate(np.unique(y_set)):

        plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], color = ListedColormap(('red', 'green'))(i), label=i)

        plt.xlabel("Age")

        plt.ylabel("Salary")

        plt.legend()

        plt.show()

QUESTIONS:
What is i and j?

What are enumerate and np.unique doing?

What is x_set[y_set == j, 0] doing?

What is '(i)' color = ListedColormap(('red', 'green'))(i) doing here?

///
I would really appreciate your help, the more detailed the better!
/// full code

"""MODEL TO PREDICT IF USER WILL BUY THE SUV - X:(AGE,SALARY) | Y:(0/1)"""

from matplotlib import colors
import numpy as np
from numpy.core.fromnumeric import reshape
import pandas as pd
import matplotlib.pyplot as plt

dataset = pd.read_csv("data.csv")
x = dataset.iloc[:, 2:-1].values
y = dataset.iloc[:, -1].values

# apply feature scaling to the FEATURES // after splitting
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
sc_y = StandardScaler()
x = sc_x.fit_transform(x)
y = sc_y.fit_transform(y.reshape(len(y),1)).ravel().astype("int")

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)

# Fitting model to training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state=0)
classifier.fit(x_train, y_train)

# Making the cofusion matrix - contain the correct and incorrect predictions
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, classifier.predict(x_test)) # you can use y_pred_test -> [[cor,inc,inc,cor]]
print(cm) 

"""CHARTING - VISUALIZING"""

# Visualizing the training test results -- took all pixels (observation/user) points and applied the classifier on it
from matplotlib.colors import ListedColormap # listed color map will help to colorize all data points
x_set, y_set = x_train, y_train # local variables to be able to replace varibales
# for each point predict value and color it in red or green depenidng on value // gonna prepare the grid with all picel points
# ----- take the minimum value of the age THEN salary values and the maximum values ' +-1 // we dont want points to be squeezed'' // step 0.01 resolution
x1, x2 = np.meshgrid(np.arange(x_set[:,0].min(), x_set[:,0].max(), 0.01), np.arange(x_set[:,1].min(), x_set[:,1].max(), 0.01))
# (pred) apply classifier on all points/ use the contour to make the contour between regions -- if belong to class 1 green, if belongs to class 0 red color 
# .T "transpose" putting vaues of np.array into x_train array form
# this will plot the predicted areas 'classifier areas'
plt.contourf(x1,x2,classifier.predict(np.transpose(np.array([x1.ravel(), x2.ravel()]))).reshape(x1.shape), alpha=0.75, cmap= ListedColormap(('red', "green")) )
# plot th limits of age and salary
plt.xlim(np.min(x1), np.max(x1)) #np.max(x) -> x.max()
plt.ylim(x2.min(), x2.max())
# plotting all data points (real values)
# enumerate() object takes a (0,1) and assigns them an index in a tuple // .unique returns a (tuple) of unique values e.g. (0,1)
for i, j in enumerate(np.unique(y_set)): # y_set == y_train // i = (0,0) or 0 0 // j = (1,1) or 1 1
    # x_set[comparison,0] probably the comparison will return 0 or 1
    plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], color = ListedColormap(('red', 'green'))(i), label=i)
plt.xlabel("Age")
plt.ylabel("Salary")
plt.legend()
plt.show()

Larz60+ write May-23-2021, 09:40 AM:
Fixed code tags on top script

Python LogisticRegression

User Panel Messages

Announcements