Python Forum

When I run the program with the following final code, I get the error shown.

Error:ValueError                                Traceback (most recent call last)
Input In [34], in <cell line: 2>()
      1 model = ExtraTreesClassifier()
----> 2 model.fit(X, y)
      3 print(model.feature_importances_)

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\ensemble\_forest.py:331, in BaseForest.fit(self, X, y, sample_weight)
    329 if issparse(y):
    330     raise ValueError("sparse multilabel-indicator for y is not supported.")
--> 331 X, y = self._validate_data(
    332     X, y, multi_output=True, accept_sparse="csc", dtype=DTYPE
    333 )
    334 if sample_weight is not None:
    335     sample_weight = _check_sample_weight(sample_weight, X)

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\base.py:596, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
    594         y = check_array(y, input_name="y", **check_y_params)
    595     else:
--> 596         X, y = check_X_y(X, y, **check_params)
    597     out = X, y
    599 if not no_val_X and check_params.get("ensure_2d", True):

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\utils\validation.py:1090, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
   1070     raise ValueError(
   1071         f"{estimator_name} requires y to be passed, but the target y is None"
   1072     )
   1074 X = check_array(
   1075     X,
   1076     accept_sparse=accept_sparse,
   (...)
   1087     input_name="X",
   1088 )
-> 1090 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric, estimator=estimator)
   1092 check_consistent_length(X, y)
   1094 return X, y

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\utils\validation.py:1100, in _check_y(y, multi_output, y_numeric, estimator)
   1098 """Isolated part of check_X_y dedicated to y validation"""
   1099 if multi_output:
-> 1100     y = check_array(
   1101         y,
   1102         accept_sparse="csr",
   1103         force_all_finite=True,
   1104         ensure_2d=False,
   1105         dtype=None,
   1106         input_name="y",
   1107         estimator=estimator,
   1108     )
   1109 else:
   1110     estimator_name = _check_estimator_name(estimator)

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\utils\validation.py:899, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    893         raise ValueError(
    894             "Found array with dim %d. %s expected <= 2."
    895             % (array.ndim, estimator_name)
    896         )
    898     if force_all_finite:
--> 899         _assert_all_finite(
    900             array,
    901             input_name=input_name,
    902             estimator_name=estimator_name,
    903             allow_nan=force_all_finite == "allow-nan",
    904         )
    906 if ensure_min_samples > 0:
    907     n_samples = _num_samples(array)

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\utils\validation.py:146, in _assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name)
    124         if (
    125             not allow_nan
    126             and estimator_name
   (...)
    130             # Improve the error message on how to handle missing values in
    131             # scikit-learn.
    132             msg_err += (
    133                 f"\n{estimator_name} does not accept missing values"
    134                 " encoded as NaN natively. For supervised learning, you might want"
   (...)
    144                 "#estimators-that-handle-nan-values"
    145             )
--> 146         raise ValueError(msg_err)
    148 # for object dtype data, we only check for NaNs (GH-13254)
    149 elif X.dtype == np.dtype("object") and not allow_nan:

ValueError: Input y contains NaN.

I have changed machine_ status to numeric values using the following subroutine.

df2["machine_status"] = df2["machine_status"].map(
    lambda x: 0
    if x == "NORMAL"
    else 1
    if x == "BROKEN"
    else 2
    if x == "BROKEN"
    else np.NaN

Now machine status is either normal, recovering or broken. But I combine Recovering with Broken. ...

I think I know the answer to this so please be patient and I will check if my idea works.

I have the answer to this question. Thus, there is no need to answer.

Respectfully,

LZ

If you have the answer you should post the answer.

How was BROKEN(1) different from BROKEN(2)? This works for me, but I never get 2 as a status code for obvious reasons.

import pandas as pd
import numpy as np

df = pd.DataFrame({"status": ["NORMAL", "BROKEN", "BROKEN", "UNKNOWN"]})
df["code"] = df["status"].map(
    lambda x:
    0 if x == "NORMAL" else
    1 if x == "BROKEN" else 
    np.nan
)
print(df)

Output:0   NORMAL   0.0
1   BROKEN   1.0
2   BROKEN   1.0
3  UNKNOWN   NaN

You ask a lot of lazy questions. It would be appreciated if you stepped up your effort and provided runnable examples instead of "Here's some code and the error trace". I find the effort of writing a nice short program that demonstrates the problem is often enough for me to find the answer myself. It makes me a better programmer. It would do the same for you.

Here is the original if-then-else statement

df2["machine_status"] = df2["machine_status"].map(
    lambda x: 0
    if x == "NORMAL"
    else 1
    if x == "BROKEN"
    else 2
    if x == "BROKEN"
    else np.NaN

I am changing strings such a NORMAL, BROKEN or RECOVERING to numbers. The numbers are O if NORMAL and 1 if BROKEN or RECOVERING.

At first, I thought that the unexpected Error (the interpreter said my machine-status columns had NANs and then it failed. i thought it was confusing a 0 as a number for 0 as a symbol. I did have a lot of NAN's. I did not know what o do about this. That was not the case.

I fact the if-then-else statement was clearly wrong.

So, I corrected the if-then-else statement to this:

df2["machine_status"] = df2["machine_status"].map(
    lambda x: 0
    if x == "NORMAL"
    else 1
)

And then my program worked as I wanted.

Respectfully,

LZ

So the error is that fit(x, y) tosses an error if you have NaN's in x or y, and your lambda to convert machine status to numbers was inserting NaN's.

Yes, ii was. My initial guess was way off the mark.

Way off.

R,

LZ

I think you would get a lot of benefit from putting more effort into your posts. When I post I try to write a small example that other people can run but still demonstrates my problem. Often writing the example exposes the solution, and if nothing else demonstrating to others that I've really put some work into solving this problem and making it easy for them to help me will get me good answers quicker. Not to mention that writing small code examples for a specific purpose makes me a better programmer and a better debugger.

Led_Zeppelin

Led_Zeppelin

deanhystad

Led_Zeppelin

deanhystad

Led_Zeppelin

deanhystad