ValueError: Found input variables

AhmadMWaddah · (This post was last modified: Mar-03-2020, 01:25 AM by Larz60+.)

OS: Ubuntu 18.04
Python3
Editors PyCharm and Jupyter Lab

hi All Pals.
i looked at the link in same title but i found that i have data same length. and still give me this error.
Error :

Error:Traceback (most recent call last):
  File "Regressor.py", line 103, in <module>
    rdg_regressor()
  File "Regressor.py", line 95, in rdg_regressor
    rdg.fit(rdg_poly_regression.fit_transform(salary_features_train), salary_labels_train)
  File "/home/ahmdwd/.local/lib/python3.6/site-packages/sklearn/linear_model/_ridge.py", line 766, in fit
    return super().fit(X, y, sample_weight=sample_weight)
  File "/home/ahmdwd/.local/lib/python3.6/site-packages/sklearn/linear_model/_ridge.py", line 547, in fit
    multi_output=True, y_numeric=True)
  File "/home/ahmdwd/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 765, in check_X_y
    check_consistent_length(X, y)
  File "/home/ahmdwd/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 212, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [24, 6]

Shapes For Features and Labels Outputs :

Output:(24, 1)
(6, 1)

also code is here to test.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os

# Importing the dataset
os.chdir('/home/ahmdwd/Documents/ML Lcture/Salay_Data')
salary_data_frame = pd.read_csv('Salary_Data.csv')
salary_features = salary_data_frame.iloc[:, :-1].values
salary_labels = salary_data_frame.iloc[:, -1].values

# Reshape Features and Labels
salary_features = salary_features.reshape(-1, 1)
salary_labels = salary_labels.reshape(-1, 1)
# print(salary_features)
# print(salary_labels)


from sklearn.model_selection import train_test_split

salary_features_train, salary_labels_train, salary_features_test, salary_labels_test = train_test_split(
    salary_features, salary_labels, test_size=0.2, random_state=0, shuffle=False)

# Fitting Linear Regression to the dataSet
from sklearn.linear_model import LinearRegression

linear_regression = LinearRegression()
linear_regression.fit(salary_features, salary_labels)

# Fitting Polynomial Regression to the dataSet
from sklearn.preprocessing import PolynomialFeatures

poly_regression = PolynomialFeatures(degree=10)
poly_salary_features = poly_regression.fit_transform(salary_features)
linear_poly_regression = LinearRegression()
linear_poly_regression.fit(poly_salary_features, salary_labels)

# Evaluation
from sklearn.metrics import r2_score

error_one = r2_score(salary_labels, linear_regression.predict(salary_features))
error_two = r2_score(salary_labels, linear_poly_regression.predict(poly_regression.fit_transform(salary_features)))
print(f'R Squared For Linear Regression Is : {error_one} ')
print(f'R Squared For Polynomial Linear Regression Is : {error_two} ')

# Fitting Polynomial Regression to the dataSet With Degree Of 7.
poly_regression_7 = PolynomialFeatures(degree=7)
poly_salary_features_7 = poly_regression_7.fit_transform(salary_features)
linear_poly_regression = LinearRegression()
linear_poly_regression.fit(poly_salary_features_7, salary_labels)

error_one_7 = r2_score(salary_labels, linear_regression.predict(salary_features))
error_two_7 = r2_score(salary_labels, linear_poly_regression.predict(poly_regression_7.fit_transform(salary_features)))
print(f'R Squared For Linear Regression With Degree " 7 " Is : {error_one_7} ')
print(f'R Squared For Polynomial Linear Regression With Degree " 7 " Is : {error_two_7} ')


# Fitting Polynomial Regression to the dataSet With Degree as set Of List.
def best_degree_range():
    degree_list = []
    error_list = []
    for dgr in range(1, 21):
        degree_list.append(dgr)
        poly_regression_dgr = PolynomialFeatures(degree=dgr)
        poly_salary_features_dgr = poly_regression_dgr.fit_transform(salary_features)

        linear_poly_regression_degree = LinearRegression()
        linear_poly_regression_degree.fit(poly_salary_features_dgr, salary_labels)

        # Evaluation
        error_poly = r2_score(salary_labels,
                              linear_poly_regression_degree.predict(poly_regression_dgr.fit_transform(salary_features)))
        error_list.append(error_poly)

    error_list_max = max(error_list)
    print(error_list_max)

    for e, d in zip(error_list, degree_list):
        if e == error_list_max:
            print(f'Highest R Squared is {e}, and Degree For It Is {d}')
            best_degree = d
            print('----------------')
            return best_degree


best_degree_range()

# Fitting Polynomial Regression to the dataSet With Ridge Regression and Alpha is (1).
from sklearn.linear_model import Ridge
def rdg_regressor():
    print(salary_features_train.shape)
    print(salary_labels_train.shape)
    best_degree = best_degree_range()
    rdg = Ridge(alpha=1, normalize=True)
    rdg_poly_regression = PolynomialFeatures(degree=best_degree)

    rdg.fit(rdg_poly_regression.fit_transform(salary_features_train), salary_labels_train)

    plt.title('Alpha = 1')
    plt.plot(salary_labels_train, '.', rdg.predict(rdg_poly_regression.fit_transform(salary_features_train)), '-o')
    plt.show()
    print('------------------')


rdg_regressor()

**scidam** · Mar-03-2020, 02:08 PM

It seems that unlucky line in your code is line # 13. Try to comment it.
Your dataset has 6 rows, since shape of label-array is (6, 1). You reshaped the feature matrix to (-1, 1)-shape, that was a mistake. Originally, the feature matrix has shape (6, 4) (4 the number of features, 6 - the number of rows). After reshaping it became (6*4, 1), because passing "-1" to reshape method stands for "find appropriate dimension to keep the number of elements as in original array". Originally, salary_features.shape = (6, 4); after applying salary_features.reshape(-1, 1) you got salary_features.shape=(6*4, 1).

AhmadMWaddah · Mar-03-2020, 08:41 PM

(Mar-03-2020, 02:08 PM)scidam Wrote: It seems that unlucky line in your code is line # 13. Try to comment it.
Your dataset has 6 rows, since shape of label-array is (6, 1). You reshaped the feature matrix to (-1, 1)-shape, that was a mistake. Originally, the feature matrix has shape (6, 4) (4 the number of features, 6 - the number of rows). After reshaping it became (6*4, 1), because passing "-1" to reshape method stands for "find appropriate dimension to keep the number of elements as in original array". Originally, salary_features.shape = (6, 4); after applying salary_features.reshape(-1, 1) you got salary_features.shape=(6*4, 1).

First Thanks For Support My Friend ..,
I Tried Commenting The Line 13 and Also 14 And Still Same Result i Guess It IS About The Ridge Regressor, Because File Runs Till this Function Line (86)

best_degree_range()

AhmadMWaddah · Mar-03-2020, 10:19 PM

Thank You Pal, I Found There Is Switching for Variables Names in Train_test_Split,
in was salary_features_train, salary_labels_train, salary_features_test, salary_labels_test

and it must be :
salary_features_train, salary_features_test, salary_labels_train, salary_labels_test

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	ValueError: Found input variables with inconsistent numbers of samples: [5, 6]	bongielondy	6	33,213	Jun-28-2021, 05:23 AM Last Post: ricslato
	ValueError: Found array with 0 samples	marcellam	1	7,249	Apr-22-2020, 04:12 PM Last Post: jefsummers
	ValueError: Found input variables with inconsistent numbers of sample	robert2joe	0	5,204	Mar-25-2020, 11:10 AM Last Post: robert2joe
	ValueError: Input contains infinity or a value too large for dtype('float64')	Rabah_r	1	14,317	Apr-06-2019, 11:08 AM Last Post: scidam
	ValueError: could not broadcast input array from shape (75) into shape (25)	route2sabya	0	7,178	Mar-14-2019, 01:14 PM Last Post: route2sabya
	ValueError: Found input variables with inconsistent numbers of samples: [0, 3]	ayaz786amd	2	10,386	Nov-27-2018, 07:12 AM Last Post: ayaz786amd

ValueError: Found input variables

User Panel Messages

Announcements