What are these python lines for? What are tey doing?

Led_Zeppelin · (This post was last modified: Feb-08-2023, 06:34 PM by Led_Zeppelin.)

In the following code I need to know exactly what the two lines of code are doing:

#!/usr/bin/env python
# coding: utf-8

# In[1]:

import numpy as np
import pandas as pd


# In[2]:


data = pd.read_csv("concrete_data.csv")
data.head()


# In[3]:


X = data.iloc[:, :8].values
Y = data.iloc[:, 8].values.reshape(-1,1)


# In[4]:


print(np.shape(X))
print(np.shape(Y))


# In[5]:


from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = .2, random_state=2021)


# In[6]:


from xgboost import XGBRegressor
xgb_model = XGBRegressor(random_state = 2021)


# In[7]:


# make a dictionary of hyperparameter values to search
search_space = {
    "n_estimators" : [100, 200, 500],
    "max_depth" : [3, 6, 9],
    "gamma" : [0.01, 0.1],
    "learning_rate" : [0.001, 0.01, 0.1, 1]
}


# In[8]:


from sklearn.model_selection import GridSearchCV
# make a GridSearchCV object
GS = GridSearchCV(estimator = xgb_model,
                  param_grid = search_space,
                  scoring = ["r2", "neg_root_mean_squared_error"], #sklearn.metrics.SCORERS.keys()
                  refit = "r2",
                  cv = 5,
                  verbose = 1)


# In[9]:


GS.fit(X_train, Y_train)


# In[10]:


print(GS.best_estimator_) # to get the complete details of the best model


# In[11]:


print(GS.best_params_) # to get only the best hyperparameter values that we searched for


# In[12]:


print(GS.best_score_) # score according to the metric we passed in refit


# In[13]:


df = pd.DataFrame(GS.cv_results_)
df = df.sort_values("rank_test_r2")
df.to_csv("cv_results.csv", index = False)

In lines 15 and 16, I know that they are selecting columns. I just do not know what columns they are selecting.

Any help appreciated.

Respectfully,

Led_Zeppelin

I have now added three sections. I is a screenshot of the whole program and its output. I thought it might be helpful in answering my question.

**deanhystad** · (This post was last modified: Feb-09-2023, 03:08 AM by deanhystad.)

The pandas documentation is pretty good. This link has several nice examples that will answer your questions about iloc indexing.

https://pandas.pydata.org/pandas-docs/st....iloc.html

You should run their examples and play around with the index numbers and slices until you really understand what iloc is doing.

I don't like your example, so I'll make a better one.

import pandas as pd


def evaluate(comment, code):
    print(comment, code)
    print(eval(code))
    print()


df = pd.DataFrame({i: [i * p for p in range(6)] for i in range(1, 6)})
evaluate("Entire dataframe", "df")
evaluate("First three rows", "df.iloc[:3]")
evaluate("Rows 1 and 2", "df.iloc[1:3]")
evaluate("Column 2 (third column)", "df.iloc[:, 2]")
evaluate("First three columns", "df.iloc[:, :3]")
evaluate("First two rows of columns 1, 2, 3", "df.iloc[:2, 1:4]")

Output:Entire dataframe df
   1   2   3   4   5
0  0   0   0   0   0
1  1   2   3   4   5
2  2   4   6   8  10
3  3   6   9  12  15
4  4   8  12  16  20
5  5  10  15  20  25

First three rows df.iloc[:3]
   1  2  3  4   5
0  0  0  0  0   0
1  1  2  3  4   5
2  2  4  6  8  10

Rows 1 and 2 df.iloc[1:3]
   1  2  3  4   5
1  1  2  3  4   5
2  2  4  6  8  10

Column 2 (third column) df.iloc[:, 2]
0     0
1     3
2     6
3     9
4    12
5    15
Name: 3, dtype: int64

First three columns df.iloc[:, :3]
   1   2   3
0  0   0   0
1  1   2   3
2  2   4   6
3  3   6   9
4  4   8  12
5  5  10  15

First two rows of columns 1, 2, 3 df.iloc[:2, 1:4]
   2  3  4
0  0  0  0
1  2  3  4

Do you also have questions about value and reshape()?

Led_Zeppelin · Feb-10-2023, 06:43 PM

I noticed that there is more space inside the bracket in in line 15 than in line 16. Why is that?

Respectfully,

Led_Zeppelin

**deanhystad** · (This post was last modified: Feb-10-2023, 10:18 PM by deanhystad.)

Quote:I noticed that there is more space inside the bracket in in line 15 than in line 16. Why is that?

You are mistaken. Spacing is the same in both lines.

X = data.iloc[:, :8].values
Y = data.iloc[:, 8].values.reshape(-1,1)
#               ^ one space

Did you look at my example? I demonstrated the difference between lines 15 and 16 here:

evaluate("First three columns", "df.iloc[:, :3]")   # Does the same thing as line 15
evaluate("Column 2 (third column)", "df.iloc[:, 2]")  # Does the same thing as line 16

Led_Zeppelin · Feb-12-2023, 10:38 PM

Now what at second. You are correct, but I think that you read my question incorrectly. I meant the total pace between the brackets,
not the space that you referred to. The total space between these []. It is not the same for both lines.

Why is it different? That is the whole question.

Respectfully,

Led Zeppelin

**deanhystad** · Feb-13-2023, 03:19 AM

I answered the question in my last two posts. I even provided examples that printed the results for comparison.

Led_Zeppelin · Feb-13-2023, 01:17 PM

Okay this dataframe has a header. It is obviously not us in numerical calculations. What line of code deletes or ignores it?

Respectfully,

James M. Yunker

**deanhystad** · (This post was last modified: Feb-13-2023, 03:08 PM by deanhystad.)

The row and column indices are not values. DataFrame.value (or Series.value) returns the values without the indices

import pandas as pd

df = pd.DataFrame({"A": [1, 2, 3], "B": [2, 4, 6]})
print("DataFrame", df, sep="\n", end="\n\n")
print("Values", df.values, sep="\n", end="\n\n")

.

Output:DataFrame
   A  B
0  1  2
1  2  4
2  3  6

Values
[[1 2]
 [2 4]
 [3 6]]

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	python seems to be skipping lines of code	alansandbucket	1	5,960	Jun-22-2021, 01:18 AM Last Post: Larz60+
	Iterate 2 large text files across lines and replace lines in second file	medatib531	13	9,238	Aug-10-2020, 11:01 PM Last Post: medatib531
	How to re run lines in python?	MrDoggo124	5	6,033	May-19-2019, 06:29 PM Last Post: MrDoggo124
	Arrange lines in python	pythonlover	11	9,115	Sep-15-2017, 11:34 PM Last Post: pythonlover
	parallel(offset) lines using python	johnfriend	1	5,109	May-05-2017, 06:10 AM Last Post: buran

What are these python lines for? What are tey doing?

User Panel Messages

Announcements