Python Forum
What are these python lines for? What are tey doing?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
What are these python lines for? What are tey doing?
#1
In the following code I need to know exactly what the two lines of code are doing:

#!/usr/bin/env python
# coding: utf-8

# In[1]:

import numpy as np
import pandas as pd


# In[2]:


data = pd.read_csv("concrete_data.csv")
data.head()


# In[3]:


X = data.iloc[:, :8].values
Y = data.iloc[:, 8].values.reshape(-1,1)


# In[4]:


print(np.shape(X))
print(np.shape(Y))


# In[5]:


from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = .2, random_state=2021)


# In[6]:


from xgboost import XGBRegressor
xgb_model = XGBRegressor(random_state = 2021)


# In[7]:


# make a dictionary of hyperparameter values to search
search_space = {
    "n_estimators" : [100, 200, 500],
    "max_depth" : [3, 6, 9],
    "gamma" : [0.01, 0.1],
    "learning_rate" : [0.001, 0.01, 0.1, 1]
}


# In[8]:


from sklearn.model_selection import GridSearchCV
# make a GridSearchCV object
GS = GridSearchCV(estimator = xgb_model,
                  param_grid = search_space,
                  scoring = ["r2", "neg_root_mean_squared_error"], #sklearn.metrics.SCORERS.keys()
                  refit = "r2",
                  cv = 5,
                  verbose = 1)


# In[9]:


GS.fit(X_train, Y_train)


# In[10]:


print(GS.best_estimator_) # to get the complete details of the best model


# In[11]:


print(GS.best_params_) # to get only the best hyperparameter values that we searched for


# In[12]:


print(GS.best_score_) # score according to the metric we passed in refit


# In[13]:


df = pd.DataFrame(GS.cv_results_)
df = df.sort_values("rank_test_r2")
df.to_csv("cv_results.csv", index = False)
In lines 15 and 16, I know that they are selecting columns. I just do not know what columns they are selecting.

Any help appreciated.

Respectfully,

Led_Zeppelin

I have now added three sections. I is a screenshot of the whole program and its output. I thought it might be helpful in answering my question.

Attached Files

Thumbnail(s)
           
Reply
#2
The pandas documentation is pretty good. This link has several nice examples that will answer your questions about iloc indexing.

https://pandas.pydata.org/pandas-docs/st....iloc.html

You should run their examples and play around with the index numbers and slices until you really understand what iloc is doing.

I don't like your example, so I'll make a better one.
import pandas as pd


def evaluate(comment, code):
    print(comment, code)
    print(eval(code))
    print()


df = pd.DataFrame({i: [i * p for p in range(6)] for i in range(1, 6)})
evaluate("Entire dataframe", "df")
evaluate("First three rows", "df.iloc[:3]")
evaluate("Rows 1 and 2", "df.iloc[1:3]")
evaluate("Column 2 (third column)", "df.iloc[:, 2]")
evaluate("First three columns", "df.iloc[:, :3]")
evaluate("First two rows of columns 1, 2, 3", "df.iloc[:2, 1:4]")
Output:
Entire dataframe df 1 2 3 4 5 0 0 0 0 0 0 1 1 2 3 4 5 2 2 4 6 8 10 3 3 6 9 12 15 4 4 8 12 16 20 5 5 10 15 20 25 First three rows df.iloc[:3] 1 2 3 4 5 0 0 0 0 0 0 1 1 2 3 4 5 2 2 4 6 8 10 Rows 1 and 2 df.iloc[1:3] 1 2 3 4 5 1 1 2 3 4 5 2 2 4 6 8 10 Column 2 (third column) df.iloc[:, 2] 0 0 1 3 2 6 3 9 4 12 5 15 Name: 3, dtype: int64 First three columns df.iloc[:, :3] 1 2 3 0 0 0 0 1 1 2 3 2 2 4 6 3 3 6 9 4 4 8 12 5 5 10 15 First two rows of columns 1, 2, 3 df.iloc[:2, 1:4] 2 3 4 0 0 0 0 1 2 3 4
Do you also have questions about value and reshape()?
Reply
#3
I noticed that there is more space inside the bracket in in line 15 than in line 16. Why is that?

Respectfully,

Led_Zeppelin
Reply
#4
Quote:I noticed that there is more space inside the bracket in in line 15 than in line 16. Why is that?
You are mistaken. Spacing is the same in both lines.
X = data.iloc[:, :8].values
Y = data.iloc[:, 8].values.reshape(-1,1)
#               ^ one space
Did you look at my example? I demonstrated the difference between lines 15 and 16 here:
evaluate("First three columns", "df.iloc[:, :3]")   # Does the same thing as line 15
evaluate("Column 2 (third column)", "df.iloc[:, 2]")  # Does the same thing as line 16
Reply
#5
Now what at second. You are correct, but I think that you read my question incorrectly. I meant the total pace between the brackets,
not the space that you referred to. The total space between these []. It is not the same for both lines.

Why is it different? That is the whole question.

Respectfully,

Led Zeppelin
Reply
#6
I answered the question in my last two posts. I even provided examples that printed the results for comparison.
Reply
#7
Okay this dataframe has a header. It is obviously not us in numerical calculations. What line of code deletes or ignores it?

Respectfully,

James M. Yunker
Reply
#8
The row and column indices are not values. DataFrame.value (or Series.value) returns the values without the indices
import pandas as pd

df = pd.DataFrame({"A": [1, 2, 3], "B": [2, 4, 6]})
print("DataFrame", df, sep="\n", end="\n\n")
print("Values", df.values, sep="\n", end="\n\n")
.
Output:
DataFrame A B 0 1 2 1 2 4 2 3 6 Values [[1 2] [2 4] [3 6]]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  python seems to be skipping lines of code alansandbucket 1 4,172 Jun-22-2021, 01:18 AM
Last Post: Larz60+
  Iterate 2 large text files across lines and replace lines in second file medatib531 13 5,883 Aug-10-2020, 11:01 PM
Last Post: medatib531
  How to re run lines in python? MrDoggo124 5 4,567 May-19-2019, 06:29 PM
Last Post: MrDoggo124
  Arrange lines in python pythonlover 11 7,287 Sep-15-2017, 11:34 PM
Last Post: pythonlover
  parallel(offset) lines using python johnfriend 1 4,299 May-05-2017, 06:10 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020