Python LogisticRegression

albertjblack · (This post was last modified: May-23-2021, 09:41 AM by Larz60+.)

Hello can anyone help
I do not understand what is happening here...
I do not understand what is happening here... PYTHON CODE

    for i, j in enumerate(np.unique(y_set)):

        plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], color = ListedColormap(('red', 'green'))(i), label=i)

        plt.xlabel("Age")

        plt.ylabel("Salary")

        plt.legend()

        plt.show()

QUESTIONS:
What is i and j?

What are enumerate and np.unique doing?

What is x_set[y_set == j, 0] doing?

What is '(i)' color = ListedColormap(('red', 'green'))(i) doing here?

///
I would really appreciate your help, the more detailed the better!
/// full code

"""MODEL TO PREDICT IF USER WILL BUY THE SUV - X:(AGE,SALARY) | Y:(0/1)"""

from matplotlib import colors
import numpy as np
from numpy.core.fromnumeric import reshape
import pandas as pd
import matplotlib.pyplot as plt

dataset = pd.read_csv("data.csv")
x = dataset.iloc[:, 2:-1].values
y = dataset.iloc[:, -1].values

# apply feature scaling to the FEATURES // after splitting
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
sc_y = StandardScaler()
x = sc_x.fit_transform(x)
y = sc_y.fit_transform(y.reshape(len(y),1)).ravel().astype("int")

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0)

# Fitting model to training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state=0)
classifier.fit(x_train, y_train)

# Making the cofusion matrix - contain the correct and incorrect predictions
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, classifier.predict(x_test)) # you can use y_pred_test -> [[cor,inc,inc,cor]]
print(cm) 

"""CHARTING - VISUALIZING"""

# Visualizing the training test results -- took all pixels (observation/user) points and applied the classifier on it
from matplotlib.colors import ListedColormap # listed color map will help to colorize all data points
x_set, y_set = x_train, y_train # local variables to be able to replace varibales
# for each point predict value and color it in red or green depenidng on value // gonna prepare the grid with all picel points
# ----- take the minimum value of the age THEN salary values and the maximum values ' +-1 // we dont want points to be squeezed'' // step 0.01 resolution
x1, x2 = np.meshgrid(np.arange(x_set[:,0].min(), x_set[:,0].max(), 0.01), np.arange(x_set[:,1].min(), x_set[:,1].max(), 0.01))
# (pred) apply classifier on all points/ use the contour to make the contour between regions -- if belong to class 1 green, if belongs to class 0 red color 
# .T "transpose" putting vaues of np.array into x_train array form
# this will plot the predicted areas 'classifier areas'
plt.contourf(x1,x2,classifier.predict(np.transpose(np.array([x1.ravel(), x2.ravel()]))).reshape(x1.shape), alpha=0.75, cmap= ListedColormap(('red', "green")) )
# plot th limits of age and salary
plt.xlim(np.min(x1), np.max(x1)) #np.max(x) -> x.max()
plt.ylim(x2.min(), x2.max())
# plotting all data points (real values)
# enumerate() object takes a (0,1) and assigns them an index in a tuple // .unique returns a (tuple) of unique values e.g. (0,1)
for i, j in enumerate(np.unique(y_set)): # y_set == y_train // i = (0,0) or 0 0 // j = (1,1) or 1 1
    # x_set[comparison,0] probably the comparison will return 0 or 1
    plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], color = ListedColormap(('red', 'green'))(i), label=i)
plt.xlabel("Age")
plt.ylabel("Salary")
plt.legend()
plt.show()

Larz60+ write May-23-2021, 09:40 AM:
Fixed code tags on top script

bowlofred · May-23-2021, 06:56 AM

You can look up enumerate in the python docs.

But the idea is if you want to "count" or "index" an iterable, it will count it for you. As an example:

l = ["a", "list", "of", "words"]
for element in l:
    print(element)

Output:a
list
of
words

Now we want to know the position of each. We could create an index variable and increment it each time through the loop. But enumerate() will do that for us.

l = ["a", "list", "of", "words"]
for element in enumerate(l):
    print(element)

Output:(0, 'a')
(1, 'list')
(2, 'of')
(3, 'words')

Enumerate has taken the list element and the index number and put them together in a tuple. Instead of assigning the tuple to a variable, we could assign the parts to two separate variables:

l = ["a", "list", "of", "words"]
for index, word in enumerate(l):
    print(f"'{word}' is in position {index}")

Output:'a' is in position 0
'list' is in position 1
'of' is in position 2
'words' is in position 3

So in your loop, i is set to the index of and j is set to the value of each element in the numpy object.

x_train should be the training set of data. That is assigned to x_set. Looks like this is 2-dimensional data, so you index it with two variables like x_xset[x, y]. y_set==j will be a 0 if they are not equal and will be a 1 if they are equal.

ListedColormap returns a callable (a function). So ListedColormap(('red', 'green')) creates and returns that function. So ListedColormap(('red', 'green'))(i) creates that function and then calls that function with value i. You could make it more explicit with something like:

func = ListedColormap(('red', 'green'))
color = func(i)

Python LogisticRegression

User Panel Messages

Announcements