Python Forum
ValueError: Found array with 0 samples
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
ValueError: Found array with 0 samples
#1
Hey guys! First of all, let me say that I am completely new into this. I am trying to do my capstone and I've been trying to study python but things are going down hills haha. I need to train my code to create a demand forecast based on previous sales. I am usind Spyder (via Anaconda) and I am getting an error that I have no idea how to fix it. Wall

The erros is: "ValueError: Found array with 0 sample(s) (shape=(0, 4)) while a minimum of 1 is required."
It seems that the error happens in that "#SEGUNDO TREINO DE ERRO" part. In that part I need to "train" the code to dicrease the rmsle.

Here is my code:

# IMPORTAR BIBLIOTECA
import pandas as pd
import numpy as np
from IPython import get_ipython
ipy = get_ipython()
if ipy is not None:
    ipy.run_line_magic('matplotlib', 'inline')

from sklearn.metrics import mean_squared_log_error
from sklearn.ensemble import RandomForestRegressor
from lightgbm import LGBMRegressor

# IMPORTAR ARQUIVO
data = pd.read_csv(r"C:\Users\Marcella\Documents\FEI\9 ciclo\TCC1\Banco de dados\Empresa Leo\SKU_csv2.csv", sep = ';')
df = pd.DataFrame(data)

# CRIAR COLUNA "PERÍODO" COM "ANO" E "MÊS"
data["Period"] = data["Year"].astype(str) + "-" + data["Month"].astype(str) 

# We use the datetime formatting to make sure format is consistent 
data["Period"] = pd.to_datetime(data["Period"]).dt.strftime("%Y-%m")

data3 = data.filter(regex=r'Code|Timeline|Quantity')
data3.head()

#INVERTER A ORDEM DA TABELA
df = pd.DataFrame(data3)
dfOrdenado = df.sort_values(by = 'Code', ascending = True)
dfOrdenado.head()


#DIFERENÇA DE VOLUME TIMELINE ATUAL E ANTERIOR (MES ATUAL-MES ANTERIOR)

data2 = dfOrdenado.copy()
data2['Last_Month_Quantity'] = data2.groupby(['Code'])['Quantity'].shift(-1)
data2['Last_Month_Diff'] = data2.groupby(['Code'])['Last_Month_Quantity'].diff()
data2 = data2.dropna()
data2.head()

#PRIMEIRO TREINO DE ERRO
def rmsle(ytrue, ypred):
    return np.sqrt(mean_squared_log_error(ytrue, ypred))

mean_error = []
for Timeline in range(1,36):
    train = data2[data2['Timeline'] < Timeline]
    val = data2[data2['Timeline'] == Timeline]
    
    p = val['Last_Month_Quantity'].values

    error = rmsle(val['Quantity'].values, p)
    print('Timeline %d - Error %.5f' % (Timeline, error))
    mean_error.append(error)
print('Mean Error = %.5f' % np.mean(mean_error))

#HISTOGRAMA DO ERRO
data2['Quantity'].hist(bins=20, figsize=(10,5))


# SEGUNDO TREINO DE ERRO
mean_error = []
for Timeline in range(1,36):
    train = data2[data2['Timeline'] < Timeline]
    val = data2[data2['Timeline'] == Timeline]

    xtr, xts = train.drop(['Quantity'], axis=1), val.drop(['Quantity'], axis=1)
    ytr, yts = train['Quantity'].values, val['Quantity'].values

    mdl = RandomForestRegressor(n_estimators=1000, n_jobs=-1, random_state=0)
    mdl.fit(xtr, ytr)

    p = mdl.predict(xts)

    error = rmsle(yts, p)
    print('Timeline %d - Error %.5f' % (Timeline, error))
    mean_error.append(error)
print('Mean Error = %.5f' % np.mean(mean_error))
And here is the Output:

IPython 7.12.0 -- An enhanced Interactive Python.

# IMPORTAR BIBLIOTECA
import pandas as pd
import numpy as np
from IPython import get_ipython
ipy = get_ipython()
if ipy is not None:
    ipy.run_line_magic('matplotlib', 'inline')

from sklearn.metrics import mean_squared_log_error
from sklearn.ensemble import RandomForestRegressor
from lightgbm import LGBMRegressor

# IMPORTAR ARQUIVO
data = pd.read_csv(r"C:\Users\Marcella\Documents\FEI\9 ciclo\TCC1\Banco de dados\Empresa Leo\SKU_csv2.csv", sep = ';')
df = pd.DataFrame(data)

# CRIAR COLUNA "PERÍODO" COM "ANO" E "MÊS"
data["Period"] = data["Year"].astype(str) + "-" + data["Month"].astype(str) 

# We use the datetime formatting to make sure format is consistent 
data["Period"] = pd.to_datetime(data["Period"]).dt.strftime("%Y-%m")

data3 = data.filter(regex=r'Code|Timeline|Quantity')
data3.head()

#INVERTER A ORDEM DA TABELA
df = pd.DataFrame(data3)
dfOrdenado = df.sort_values(by = 'Code', ascending = True)
dfOrdenado.head()


#DIFERENÇA DE VOLUME TIMELINE ATUAL E ANTERIOR (MES ATUAL-MES ANTERIOR)

data2 = dfOrdenado.copy()
data2['Last_Month_Quantity'] = data2.groupby(['Code'])['Quantity'].shift(-1)
data2['Last_Month_Diff'] = data2.groupby(['Code'])['Last_Month_Quantity'].diff()
data2 = data2.dropna()
data2.head()

#PRIMEIRO TREINO DE ERRO
def rmsle(ytrue, ypred):
    return np.sqrt(mean_squared_log_error(ytrue, ypred))

mean_error = []
for Timeline in range(1,36):
    train = data2[data2['Timeline'] < Timeline]
    val = data2[data2['Timeline'] == Timeline]
    
    p = val['Last_Month_Quantity'].values

    error = rmsle(val['Quantity'].values, p)
    print('Timeline %d - Error %.5f' % (Timeline, error))
    mean_error.append(error)
print('Mean Error = %.5f' % np.mean(mean_error))

#HISTOGRAMA DO ERRO
data2['Quantity'].hist(bins=20, figsize=(10,5))


# SEGUNDO TREINO DE ERRO
mean_error = []
for Timeline in range(1,36):
    train = data2[data2['Timeline'] < Timeline]
    val = data2[data2['Timeline'] == Timeline]

    xtr, xts = train.drop(['Quantity'], axis=1), val.drop(['Quantity'], axis=1)
    ytr, yts = train['Quantity'].values, val['Quantity'].values

    mdl = RandomForestRegressor(n_estimators=1000, n_jobs=-1, random_state=0)
    mdl.fit(xtr, ytr)

    p = mdl.predict(xts)

    error = rmsle(yts, p)
    print('Timeline %d - Error %.5f' % (Timeline, error))
    mean_error.append(error)
print('Mean Error = %.5f' % np.mean(mean_error))
Timeline 1 - Error 2.70350
Timeline 2 - Error 1.61701
Timeline 3 - Error 3.18454
Timeline 4 - Error 2.40659
Timeline 5 - Error 1.45284
Timeline 6 - Error 0.69815
Timeline 7 - Error 1.02462
Timeline 8 - Error 1.93734
Timeline 9 - Error 0.48172
Timeline 10 - Error 1.87422
Timeline 11 - Error 2.91395
Timeline 12 - Error 2.15465
Timeline 13 - Error 2.24474
Timeline 14 - Error 1.58562
Timeline 15 - Error 1.24788
Timeline 16 - Error 0.20848
Timeline 17 - Error 0.72884
Timeline 18 - Error 0.10210
Timeline 19 - Error 0.55287
Timeline 20 - Error 2.73459
Timeline 21 - Error 1.87676
Timeline 22 - Error 3.05041
Timeline 23 - Error 0.97720
Timeline 24 - Error 1.62730
Timeline 25 - Error 1.85567
Timeline 26 - Error 2.42298
Timeline 27 - Error 0.91488
Timeline 28 - Error 0.88662
Timeline 29 - Error 2.16283
Timeline 30 - Error 1.81922
Timeline 31 - Error 1.46269
Timeline 32 - Error 0.53905
Timeline 33 - Error 0.27669
Timeline 34 - Error 1.87140
Timeline 35 - Error 1.87198
Mean Error = 1.58486
Traceback (most recent call last):

  File "<ipython-input-1-587546307fe9>", line 70, in <module>
    mdl.fit(xtr, ytr)

  File "C:\Users\Marcella\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 295, in fit
    X = check_array(X, accept_sparse="csc", dtype=DTYPE)

  File "C:\Users\Marcella\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 586, in check_array
    context))

ValueError: Found array with 0 sample(s) (shape=(0, 4)) while a minimum of 1 is required.
Could anyone help me with this? Thank you so much in advance!
Reply


Messages In This Thread
ValueError: Found array with 0 samples - by marcellam - Apr-19-2020, 06:12 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Separating unique, stable, samples using pandas keithpfio 1 1,115 Jun-20-2022, 07:06 PM
Last Post: keithpfio
  RandomForest --ValueError: setting an array element with a sequence JaneTan 0 1,735 Sep-08-2021, 02:12 AM
Last Post: JaneTan
  ValueError: Found input variables with inconsistent numbers of samples: [5, 6] bongielondy 6 25,907 Jun-28-2021, 05:23 AM
Last Post: ricslato
  ValueError: Found input variables with inconsistent numbers of sample robert2joe 0 4,266 Mar-25-2020, 11:10 AM
Last Post: robert2joe
  ValueError: Found input variables AhmadMWaddah 3 3,738 Mar-03-2020, 10:19 PM
Last Post: AhmadMWaddah
  ValueError: could not broadcast input array from shape (75) into shape (25) route2sabya 0 6,494 Mar-14-2019, 01:14 PM
Last Post: route2sabya
  ValueError: Found input variables with inconsistent numbers of samples: [0, 3] ayaz786amd 2 9,619 Nov-27-2018, 07:12 AM
Last Post: ayaz786amd
  ValueError: The truth value of an array with more than one element is ambiguous. Eliza5 1 14,341 Apr-02-2018, 12:03 AM
Last Post: scidam
  pandas: assemble data to have samples sdcompanies 2 3,321 Jan-19-2018, 09:45 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020