Python Forum
ValueError: Found input variables with inconsistent numbers of samples: [5, 6]
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
ValueError: Found input variables with inconsistent numbers of samples: [5, 6]
#1
I am working on a small test data. I am getting a ValueError: Found input variables with inconsistent numbers of samples: [5, 6]. How can I make the X and y shapes to be the same size. I added the line;

dataset.dropna(inplace=True)

to drop NA values so that the two samples become the same size. However I still get the Value Error. The code is;
# Importing Libraries 
import numpy as np   
import pandas as pd  
  
# Import dataset 
dataset = pd.read_csv("../output.tsv", delimiter = '\t')


# library to clean data 
import re  
  
# Natural Language Tool Kit 
import nltk  
  
nltk.download('stopwords') 
  
# to remove stopword 
from nltk.corpus import stopwords 
  
# for Stemming propose  
from nltk.stem.porter import PorterStemmer 
  
# Initialize empty array 
# to append clean text  
corpus = []  
  
# 1000 (reviews) rows to clean 
for i in range(0, 5):  
      
    # column : "Review", row ith 
    review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i]) 
    
      
    # convert all cases to lower cases 
    review = review.lower()  
      
    # split to array(default delimiter is " ") 
    review = review.split()  
      
    # creating PorterStemmer object to 
    # take main stem of each word 
    ps = PorterStemmer()  
      
    # loop for stemming each word 
    # in string array at ith row     
    review = [ps.stem(word) for word in review 
                if not word in set(stopwords.words('english'))]  
                  
    # rejoin all string array elements 
    # to create back into a string 
    review = ' '.join(review)   
      
    # append each string to create 
    # array of clean text  
    corpus.append(review)

# Creating the Bag of Words model 
from sklearn.feature_extraction.text import CountVectorizer 
  
# To extract max 1500 feature. 
# "max_features" is attribute to 
# experiment with to get better results 
cv = CountVectorizer(max_features = 9)  
  
# X contains corpus (dependent variable) 
X = cv.fit_transform(corpus).toarray()  
  
# y contains answers if review 
# is positive or negative 
y = dataset.iloc[:, 1].values 

# Splitting the dataset into 
# the Training set and Test set 
from sklearn.model_selection import train_test_split


dataset.dropna(inplace=True)
print(X.shape)
print(y.shape)


# experiment with "test_size" 
# to get better results 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
print(X_train.shape)
print(y_train.shape)
The Output from the code (for X shape and y shape) is
(5, 9)
(6,)

Error is ValueError: Found input variables with inconsistent numbers of samples: [5, 6]
Reply


Messages In This Thread
ValueError: Found input variables with inconsistent numbers of samples: [5, 6] - by bongielondy - Nov-07-2019, 03:26 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Inconsistent sorting with the .sort_values() function devansing 4 1,524 Jun-28-2022, 06:12 PM
Last Post: deanhystad
  Separating unique, stable, samples using pandas keithpfio 1 1,075 Jun-20-2022, 07:06 PM
Last Post: keithpfio
  ValueError: Found array with 0 samples marcellam 1 5,080 Apr-22-2020, 04:12 PM
Last Post: jefsummers
  ValueError: Found input variables with inconsistent numbers of sample robert2joe 0 4,215 Mar-25-2020, 11:10 AM
Last Post: robert2joe
  ValueError: Found input variables AhmadMWaddah 3 3,665 Mar-03-2020, 10:19 PM
Last Post: AhmadMWaddah
  ValueError: Input contains infinity or a value too large for dtype('float64') Rabah_r 1 12,846 Apr-06-2019, 11:08 AM
Last Post: scidam
  ValueError: could not broadcast input array from shape (75) into shape (25) route2sabya 0 6,445 Mar-14-2019, 01:14 PM
Last Post: route2sabya
  ValueError: Found input variables with inconsistent numbers of samples: [0, 3] ayaz786amd 2 9,570 Nov-27-2018, 07:12 AM
Last Post: ayaz786amd
  pandas: assemble data to have samples sdcompanies 2 3,267 Jan-19-2018, 09:45 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020