Python Forum
ValueError: Found array with 0 samples
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
ValueError: Found array with 0 samples
#1
Hey guys! First of all, let me say that I am completely new into this. I am trying to do my capstone and I've been trying to study python but things are going down hills haha. I need to train my code to create a demand forecast based on previous sales. I am usind Spyder (via Anaconda) and I am getting an error that I have no idea how to fix it. Wall

The erros is: "ValueError: Found array with 0 sample(s) (shape=(0, 4)) while a minimum of 1 is required."
It seems that the error happens in that "#SEGUNDO TREINO DE ERRO" part. In that part I need to "train" the code to dicrease the rmsle.

Here is my code:

# IMPORTAR BIBLIOTECA
import pandas as pd
import numpy as np
from IPython import get_ipython
ipy = get_ipython()
if ipy is not None:
    ipy.run_line_magic('matplotlib', 'inline')

from sklearn.metrics import mean_squared_log_error
from sklearn.ensemble import RandomForestRegressor
from lightgbm import LGBMRegressor

# IMPORTAR ARQUIVO
data = pd.read_csv(r"C:\Users\Marcella\Documents\FEI\9 ciclo\TCC1\Banco de dados\Empresa Leo\SKU_csv2.csv", sep = ';')
df = pd.DataFrame(data)

# CRIAR COLUNA "PERÍODO" COM "ANO" E "MÊS"
data["Period"] = data["Year"].astype(str) + "-" + data["Month"].astype(str) 

# We use the datetime formatting to make sure format is consistent 
data["Period"] = pd.to_datetime(data["Period"]).dt.strftime("%Y-%m")

data3 = data.filter(regex=r'Code|Timeline|Quantity')
data3.head()

#INVERTER A ORDEM DA TABELA
df = pd.DataFrame(data3)
dfOrdenado = df.sort_values(by = 'Code', ascending = True)
dfOrdenado.head()


#DIFERENÇA DE VOLUME TIMELINE ATUAL E ANTERIOR (MES ATUAL-MES ANTERIOR)

data2 = dfOrdenado.copy()
data2['Last_Month_Quantity'] = data2.groupby(['Code'])['Quantity'].shift(-1)
data2['Last_Month_Diff'] = data2.groupby(['Code'])['Last_Month_Quantity'].diff()
data2 = data2.dropna()
data2.head()

#PRIMEIRO TREINO DE ERRO
def rmsle(ytrue, ypred):
    return np.sqrt(mean_squared_log_error(ytrue, ypred))

mean_error = []
for Timeline in range(1,36):
    train = data2[data2['Timeline'] < Timeline]
    val = data2[data2['Timeline'] == Timeline]
    
    p = val['Last_Month_Quantity'].values

    error = rmsle(val['Quantity'].values, p)
    print('Timeline %d - Error %.5f' % (Timeline, error))
    mean_error.append(error)
print('Mean Error = %.5f' % np.mean(mean_error))

#HISTOGRAMA DO ERRO
data2['Quantity'].hist(bins=20, figsize=(10,5))


# SEGUNDO TREINO DE ERRO
mean_error = []
for Timeline in range(1,36):
    train = data2[data2['Timeline'] < Timeline]
    val = data2[data2['Timeline'] == Timeline]

    xtr, xts = train.drop(['Quantity'], axis=1), val.drop(['Quantity'], axis=1)
    ytr, yts = train['Quantity'].values, val['Quantity'].values

    mdl = RandomForestRegressor(n_estimators=1000, n_jobs=-1, random_state=0)
    mdl.fit(xtr, ytr)

    p = mdl.predict(xts)

    error = rmsle(yts, p)
    print('Timeline %d - Error %.5f' % (Timeline, error))
    mean_error.append(error)
print('Mean Error = %.5f' % np.mean(mean_error))
And here is the Output:

IPython 7.12.0 -- An enhanced Interactive Python.

# IMPORTAR BIBLIOTECA
import pandas as pd
import numpy as np
from IPython import get_ipython
ipy = get_ipython()
if ipy is not None:
    ipy.run_line_magic('matplotlib', 'inline')

from sklearn.metrics import mean_squared_log_error
from sklearn.ensemble import RandomForestRegressor
from lightgbm import LGBMRegressor

# IMPORTAR ARQUIVO
data = pd.read_csv(r"C:\Users\Marcella\Documents\FEI\9 ciclo\TCC1\Banco de dados\Empresa Leo\SKU_csv2.csv", sep = ';')
df = pd.DataFrame(data)

# CRIAR COLUNA "PERÍODO" COM "ANO" E "MÊS"
data["Period"] = data["Year"].astype(str) + "-" + data["Month"].astype(str) 

# We use the datetime formatting to make sure format is consistent 
data["Period"] = pd.to_datetime(data["Period"]).dt.strftime("%Y-%m")

data3 = data.filter(regex=r'Code|Timeline|Quantity')
data3.head()

#INVERTER A ORDEM DA TABELA
df = pd.DataFrame(data3)
dfOrdenado = df.sort_values(by = 'Code', ascending = True)
dfOrdenado.head()


#DIFERENÇA DE VOLUME TIMELINE ATUAL E ANTERIOR (MES ATUAL-MES ANTERIOR)

data2 = dfOrdenado.copy()
data2['Last_Month_Quantity'] = data2.groupby(['Code'])['Quantity'].shift(-1)
data2['Last_Month_Diff'] = data2.groupby(['Code'])['Last_Month_Quantity'].diff()
data2 = data2.dropna()
data2.head()

#PRIMEIRO TREINO DE ERRO
def rmsle(ytrue, ypred):
    return np.sqrt(mean_squared_log_error(ytrue, ypred))

mean_error = []
for Timeline in range(1,36):
    train = data2[data2['Timeline'] < Timeline]
    val = data2[data2['Timeline'] == Timeline]
    
    p = val['Last_Month_Quantity'].values

    error = rmsle(val['Quantity'].values, p)
    print('Timeline %d - Error %.5f' % (Timeline, error))
    mean_error.append(error)
print('Mean Error = %.5f' % np.mean(mean_error))

#HISTOGRAMA DO ERRO
data2['Quantity'].hist(bins=20, figsize=(10,5))


# SEGUNDO TREINO DE ERRO
mean_error = []
for Timeline in range(1,36):
    train = data2[data2['Timeline'] < Timeline]
    val = data2[data2['Timeline'] == Timeline]

    xtr, xts = train.drop(['Quantity'], axis=1), val.drop(['Quantity'], axis=1)
    ytr, yts = train['Quantity'].values, val['Quantity'].values

    mdl = RandomForestRegressor(n_estimators=1000, n_jobs=-1, random_state=0)
    mdl.fit(xtr, ytr)

    p = mdl.predict(xts)

    error = rmsle(yts, p)
    print('Timeline %d - Error %.5f' % (Timeline, error))
    mean_error.append(error)
print('Mean Error = %.5f' % np.mean(mean_error))
Timeline 1 - Error 2.70350
Timeline 2 - Error 1.61701
Timeline 3 - Error 3.18454
Timeline 4 - Error 2.40659
Timeline 5 - Error 1.45284
Timeline 6 - Error 0.69815
Timeline 7 - Error 1.02462
Timeline 8 - Error 1.93734
Timeline 9 - Error 0.48172
Timeline 10 - Error 1.87422
Timeline 11 - Error 2.91395
Timeline 12 - Error 2.15465
Timeline 13 - Error 2.24474
Timeline 14 - Error 1.58562
Timeline 15 - Error 1.24788
Timeline 16 - Error 0.20848
Timeline 17 - Error 0.72884
Timeline 18 - Error 0.10210
Timeline 19 - Error 0.55287
Timeline 20 - Error 2.73459
Timeline 21 - Error 1.87676
Timeline 22 - Error 3.05041
Timeline 23 - Error 0.97720
Timeline 24 - Error 1.62730
Timeline 25 - Error 1.85567
Timeline 26 - Error 2.42298
Timeline 27 - Error 0.91488
Timeline 28 - Error 0.88662
Timeline 29 - Error 2.16283
Timeline 30 - Error 1.81922
Timeline 31 - Error 1.46269
Timeline 32 - Error 0.53905
Timeline 33 - Error 0.27669
Timeline 34 - Error 1.87140
Timeline 35 - Error 1.87198
Mean Error = 1.58486
Traceback (most recent call last):

  File "<ipython-input-1-587546307fe9>", line 70, in <module>
    mdl.fit(xtr, ytr)

  File "C:\Users\Marcella\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 295, in fit
    X = check_array(X, accept_sparse="csc", dtype=DTYPE)

  File "C:\Users\Marcella\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 586, in check_array
    context))

ValueError: Found array with 0 sample(s) (shape=(0, 4)) while a minimum of 1 is required.
Could anyone help me with this? Thank you so much in advance!
Reply
#2
Could you post the actual error message in its entirety? There is often more specific information that can help to sort this out.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Separating unique, stable, samples using pandas keithpfio 1 1,074 Jun-20-2022, 07:06 PM
Last Post: keithpfio
  RandomForest --ValueError: setting an array element with a sequence JaneTan 0 1,707 Sep-08-2021, 02:12 AM
Last Post: JaneTan
  ValueError: Found input variables with inconsistent numbers of samples: [5, 6] bongielondy 6 25,302 Jun-28-2021, 05:23 AM
Last Post: ricslato
  ValueError: Found input variables with inconsistent numbers of sample robert2joe 0 4,213 Mar-25-2020, 11:10 AM
Last Post: robert2joe
  ValueError: Found input variables AhmadMWaddah 3 3,665 Mar-03-2020, 10:19 PM
Last Post: AhmadMWaddah
  ValueError: could not broadcast input array from shape (75) into shape (25) route2sabya 0 6,442 Mar-14-2019, 01:14 PM
Last Post: route2sabya
  ValueError: Found input variables with inconsistent numbers of samples: [0, 3] ayaz786amd 2 9,564 Nov-27-2018, 07:12 AM
Last Post: ayaz786amd
  ValueError: The truth value of an array with more than one element is ambiguous. Eliza5 1 14,279 Apr-02-2018, 12:03 AM
Last Post: scidam
  pandas: assemble data to have samples sdcompanies 2 3,265 Jan-19-2018, 09:45 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020