Need help with Scikit-Learn Assignment

Toh · (This post was last modified: Feb-10-2025, 08:46 AM by buran.)

what is wrong with my code why keep failing?

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
My failure result from the AI check

Output:# Print to check results
print(f"Mean Squared Error (MSE): {mse}")
print(f"R-squared: {r_squared}")

Hidden Tests Redacted

One or more test cases in this cell did not pass.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

THE ASSIGMENT AS FOLLOW

High-Level Tasks
Load and Explore the Data
Data Preprocessing
Build and Train a Linear Regression Model
Make Predictions and Evaluate the Model
Bonus Challenge (Optional)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Lab Instructions
1. Load and Explore the Data
Step 1.1: Import the required Python library and load dataset.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler

df = pd.read_csv("house_prices.csv")

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 1.2: Display the First 5 Rows
Use the provided code cell to display the first 5 rows of the dataset.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

df.head()

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 1.3: Examine Column Names and Data Types
Inspect the column names and data types using df.info().
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

df.info()

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 1.4: Get Summary Statistics
Get summary statistics of numerical columns using df.describe() and df.dtypes.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

df.describe()
df.dtypes

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
2. Data Preprocessing
Step 2.1: Handle Missing Values
Identify and handle any missing values. You could choose to drop rows with missing values or fill them with appropriate statistics (mean, median, etc.). For this activity, fill the missing values with the median to retain as much data as possible without introducing bias.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler


df = pd.read_csv("house_prices.csv")
missing_values = df.isnull().sum()

print("Missing values in each column:\n", missing_values)

for column in df.select_dtypes(include=['float64', 'int64']).columns:
    median_value = df[column].median()
    df[column].fillna(median_value, inplace=True)

missing_values_after = df.isnull().sum()
print("Missing values after filling:\n", missing_values_after)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# ... (Your existing code for displaying info, describe, dtypes, and handling missing values in 'condition')
# Correct feature and target selection using actual column names
# Feature Scaling (Important!)
#Fit and Transform the training data
# (Optional) Example of inverse transforming the predictions if needed:
# y_pred_original_scale = scaler.inverse_transform(y_pred)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

X = df.drop('price', axis=1)  # Replace 'Price' with your actual target column name
y = df['price']

X = pd.get_dummies(X, drop_first=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 2.2: Select Relevant Features
Select the features (e.g., 'sqft_living', 'bedrooms', 'bathrooms',’condition’,’floors’) and the target variable ('price').
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# Select relevant features and target variable
# Fill missing values in “condition” with median
# Show dataframe with filled values in “condition”

df = pd.read_csv('house_prices.csv')

features = ['sqft_living', 'bedrooms', 'bathrooms', 'condition', 'floors']
target = 'price'

X = df[features]
y = df[target]

print("Features (X):\n", X.head())
print("\nTarget (y):\n", y.head())

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 2.3: Encode Categorical Feature
Encode the categorical feature 'condition' using one-hot encoding.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

X_encoded = pd.get_dummies(X, columns=['condition'], drop_first=True)
print(X_encoded.head())

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 2.4: Split the Data
Split the data into training and testing sets (80% train, 20% test) using train_test_split from Scikit-Learn.

Make sure to set the random_state parameter to 42 to ensure reproducibility and obtain the same results as the expected solution.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

X_train, X_test, y_train, y_test = train_test_split(
    X_encoded,  # Features DataFrame after encoding
    y,          # Target variable
    test_size=0.2,  # 20% for testing
    random_state=42  # For reproducibility
)

print(f"X_train shape: {X_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_test shape: {y_test.shape}")             
                 
print(f'Training set shape: {X_train.shape}, Testing set shape: {X_test.shape}')

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
3. Build and Train a Linear Regression Model
Step 3.1: Import LinearRegression
Import LinearRegression from sklearn.linear_model.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

from sklearn.linear_model import LinearRegression

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 3.2: Create an Instance of the Model
Create an instance of the LinearRegression model.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

model = LinearRegression()

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#### Step 3.3: Fit the Model
Fit the model to the training data.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

model.fit(X_train, y_train)
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
4. Make Predictions and Evaluate the Model
Step 4.1: Make Predictions
Use the trained model to make predictions on the testing data.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

y_pred = model.predict(X_test)
predictions_df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
print(predictions_df.head())

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 4.2: Evaluate the Model
Calculate the Mean Squared Error (MSE) as mse and R-squared value as r_squared to evaluate the model's performance, then check your results by printing them in the following cell.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

from sklearn.metrics import mean_squared_error, r2_score

mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

r_squared = r2_score(y_test, y_pred)

print(f'Mean Squared Error (MSE): {mse}')
print(f'Root Mean Squared Error (RMSE): {rmse}')
print(f'R-squared (R²): {r_squared}')

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Check Your Results:¶
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# Print to check results
print(f"Mean Squared Error (MSE): {mse}")
print(f"R-squared: {r_squared}")

result:

Output:Mean Squared Error (MSE): 71936315243.10368
R-squared: 0.31381685646629

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 5.1: Experiment with a Different Regression Algorithm
Experiment with a different regression algorithm (e.g., DecisionTreeRegressor or RandomForestRegressor) and compare its performance to the Linear Regression model using the same evaluation metrics.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import OneHotEncoder

tree_model = DecisionTreeRegressor(random_state=42)  # Important: set random_state
tree_model.fit(X_train, y_train)

y_pred_tree = tree_model.predict(X_test)


mse_linear = mean_squared_error(y_test, y_pred)
r_squared_linear = r2_score(y_test, y_pred)
print("Linear Regression:")
print(f"MSE: {mse_linear}")
print(f"R-squared: {r_squared_linear}")

mse_tree = mean_squared_error(y_test, y_pred_tree)
r_squared_tree = r2_score(y_test, y_pred_tree)
print("\nDecision Tree Regressor:")
print(f"MSE: {mse_tree}")
print(f"R-squared: {r_squared_tree}")

result:

Output:Linear Regression:
MSE: 71936315243.10368
R-squared: 0.31381685646629

Decision Tree Regressor:
MSE: 117957015596.37538
R-squared: -0.12516349343505206

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
RandomForestRegressor
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor  # Import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import OneHotEncoder

forest_model = RandomForestRegressor(random_state=42)  # Set random_state
forest_model.fit(X_train, y_train)

y_pred_forest = forest_model.predict(X_test)

mse_linear = mean_squared_error(y_test, y_pred)
r_squared_linear = r2_score(y_test, y_pred)
print("Linear Regression:")
print(f"MSE: {mse_linear}")
print(f"R-squared: {r_squared_linear}")

# Random Forest Regressor
mse_forest = mean_squared_error(y_test, y_pred_forest)
r_squared_forest = r2_score(y_test, y_pred_forest)
print("\nRandom Forest Regressor:")
print(f"MSE: {mse_forest}")
print(f"R-squared: {r_squared_forest}")

result:

Output:Linear Regression:
MSE: 71936315243.10368
R-squared: 0.31381685646629

Random Forest Regressor:
MSE: 84176299254.89598
R-squared: 0.19706260407470433

buran write Feb-10-2025, 08:46 AM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.

alexjordan · Feb-10-2025, 10:38 AM

(Feb-10-2025, 07:15 AM)Toh Wrote: what is wrong with my code why keep failing?

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
My failure result from the AI check
Output:# Print to check results
print(f"Mean Squared Error (MSE): {mse}")
print(f"R-squared: {r_squared}")

Hidden Tests Redacted

One or more test cases in this cell did not pass.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

THE ASSIGMENT AS FOLLOW

High-Level Tasks
Load and Explore the Data
Data Preprocessing
Build and Train a Linear Regression Model
Make Predictions and Evaluate the Model
Bonus Challenge (Optional)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Lab Instructions
1. Load and Explore the Data
Step 1.1: Import the required Python library and load dataset.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler

df = pd.read_csv("house_prices.csv")
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 1.2: Display the First 5 Rows
Use the provided code cell to display the first 5 rows of the dataset.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
df.head()
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 1.3: Examine Column Names and Data Types
Inspect the column names and data types using df.info().
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
df.info()
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 1.4: Get Summary Statistics
Get summary statistics of numerical columns using df.describe() and df.dtypes.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
df.describe()
df.dtypes
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
2. Data Preprocessing
Step 2.1: Handle Missing Values
Identify and handle any missing values. You could choose to drop rows with missing values or fill them with appropriate statistics (mean, median, etc.). For this activity, fill the missing values with the median to retain as much data as possible without introducing bias.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler


df = pd.read_csv("house_prices.csv")
missing_values = df.isnull().sum()

print("Missing values in each column:\n", missing_values)

for column in df.select_dtypes(include=['float64', 'int64']).columns:
    median_value = df[column].median()
    df[column].fillna(median_value, inplace=True)

missing_values_after = df.isnull().sum()
print("Missing values after filling:\n", missing_values_after)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# ... (Your existing code for displaying info, describe, dtypes, and handling missing values in 'condition')
# Correct feature and target selection using actual column names
# Feature Scaling (Important!)
#Fit and Transform the training data
# (Optional) Example of inverse transforming the predictions if needed:
# y_pred_original_scale = scaler.inverse_transform(y_pred)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
X = df.drop('price', axis=1)  # Replace 'Price' with your actual target column name
y = df['price']

X = pd.get_dummies(X, drop_first=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 2.2: Select Relevant Features
Select the features (e.g., 'sqft_living', 'bedrooms', 'bathrooms',’condition’,’floors’) and the target variable ('price').
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Select relevant features and target variable
# Fill missing values in “condition” with median
# Show dataframe with filled values in “condition”

df = pd.read_csv('house_prices.csv')

features = ['sqft_living', 'bedrooms', 'bathrooms', 'condition', 'floors']
target = 'price'

X = df[features]
y = df[target]

print("Features (X):\n", X.head())
print("\nTarget (y):\n", y.head())
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 2.3: Encode Categorical Feature
Encode the categorical feature 'condition' using one-hot encoding.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
X_encoded = pd.get_dummies(X, columns=['condition'], drop_first=True)
print(X_encoded.head())
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 2.4: Split the Data
Split the data into training and testing sets (80% train, 20% test) using train_test_split from Scikit-Learn.

Make sure to set the random_state parameter to 42 to ensure reproducibility and obtain the same results as the expected solution.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
X_train, X_test, y_train, y_test = train_test_split(
    X_encoded,  # Features DataFrame after encoding
    y,          # Target variable
    test_size=0.2,  # 20% for testing
    random_state=42  # For reproducibility
)

print(f"X_train shape: {X_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_test shape: {y_test.shape}")             
                 
print(f'Training set shape: {X_train.shape}, Testing set shape: {X_test.shape}')
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
3. Build and Train a Linear Regression Model
Step 3.1: Import LinearRegression
Import LinearRegression from sklearn.linear_model.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
from sklearn.linear_model import LinearRegression
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 3.2: Create an Instance of the Model
Create an instance of the LinearRegression model.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
model = LinearRegression()
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#### Step 3.3: Fit the Model
Fit the model to the training data.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
model.fit(X_train, y_train)
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
4. Make Predictions and Evaluate the Model
Step 4.1: Make Predictions
Use the trained model to make predictions on the testing data.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
y_pred = model.predict(X_test)
predictions_df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
print(predictions_df.head())
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 4.2: Evaluate the Model
Calculate the Mean Squared Error (MSE) as mse and R-squared value as r_squared to evaluate the model's performance, then check your results by printing them in the following cell.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
from sklearn.metrics import mean_squared_error, r2_score

mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)

r_squared = r2_score(y_test, y_pred)

print(f'Mean Squared Error (MSE): {mse}')
print(f'Root Mean Squared Error (RMSE): {rmse}')
print(f'R-squared (R²): {r_squared}')
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Check Your Results:¶
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Print to check results
print(f"Mean Squared Error (MSE): {mse}")
print(f"R-squared: {r_squared}")
result:
Output:Mean Squared Error (MSE): 71936315243.10368
R-squared: 0.31381685646629
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Step 5.1: Experiment with a Different Regression Algorithm
Experiment with a different regression algorithm (e.g., DecisionTreeRegressor or RandomForestRegressor) and compare its performance to the Linear Regression model using the same evaluation metrics.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import OneHotEncoder

tree_model = DecisionTreeRegressor(random_state=42)  # Important: set random_state
tree_model.fit(X_train, y_train)

y_pred_tree = tree_model.predict(X_test)


mse_linear = mean_squared_error(y_test, y_pred)
r_squared_linear = r2_score(y_test, y_pred)
print("Linear Regression:")
print(f"MSE: {mse_linear}")
print(f"R-squared: {r_squared_linear}")

mse_tree = mean_squared_error(y_test, y_pred_tree)
r_squared_tree = r2_score(y_test, y_pred_tree)
print("\nDecision Tree Regressor:")
print(f"MSE: {mse_tree}")
print(f"R-squared: {r_squared_tree}")
result:
Output:Linear Regression:
MSE: 71936315243.10368
R-squared: 0.31381685646629

Decision Tree Regressor:
MSE: 117957015596.37538
R-squared: -0.12516349343505206
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
RandomForestRegressor
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor  # Import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import OneHotEncoder

forest_model = RandomForestRegressor(random_state=42)  # Set random_state
forest_model.fit(X_train, y_train)

y_pred_forest = forest_model.predict(X_test)

mse_linear = mean_squared_error(y_test, y_pred)
r_squared_linear = r2_score(y_test, y_pred)
print("Linear Regression:")
print(f"MSE: {mse_linear}")
print(f"R-squared: {r_squared_linear}")

# Random Forest Regressor
mse_forest = mean_squared_error(y_test, y_pred_forest)
r_squared_forest = r2_score(y_test, y_pred_forest)
print("\nRandom Forest Regressor:")
print(f"MSE: {mse_forest}")
print(f"R-squared: {r_squared_forest}")
result:
Output:Linear Regression:
MSE: 71936315243.10368
R-squared: 0.31381685646629

Random Forest Regressor:
MSE: 84176299254.89598
R-squared: 0.19706260407470433

It looks like your code is correctly following the steps for building and evaluating a Linear Regression model in Scikit-Learn. However, your assignment is failing some hidden tests, which could be due to a few common issues. Here are some steps to debug and improve your model:

Possible Issues & Fixes
Check Data Preprocessing:

Ensure that all missing values are handled correctly. You are filling missing numerical values with the median, which is good, but check if any categorical variables need special handling.
Verify that price is not mistakenly included in X after encoding categorical variables.
Confirm that df['condition'] is properly one-hot encoded before training.
Feature Scaling Issue:

You're applying StandardScaler() to X_train and X_test, but ensure it’s applied after encoding categorical features and that no data leakage occurs. Try:
python
Copy
Edit
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
Then use X_train_scaled and X_test_scaled for training and testing.
Check Model Performance:

Your R² value (~0.31) suggests the model explains only ~31% of the variance, which is quite low.
Try adding more features (e.g., sqft_basement, yr_built) or transforming features (log transformation for skewed variables).
Evaluate Hidden Test Differences:

The AI grading system might expect a specific format for your results. Try printing mse and r_squared with rounding:
python
Copy
Edit
print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"R-squared: {r_squared:.4f}")
Try an Alternative Model:

If Linear Regression isn't performing well, test other models like DecisionTreeRegressor or RandomForestRegressor:
python
Copy
Edit
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

I have applied this one for one of my projects, so helpfeully this will help you out!

Need help with Scikit-Learn Assignment

User Panel Messages

Announcements