Jul-27-2022, 04:33 PM
When I run the program with the following final code, I get the error shown.
I think I know the answer to this so please be patient and I will check if my idea works.
Error:ValueError Traceback (most recent call last)
Input In [34], in <cell line: 2>()
1 model = ExtraTreesClassifier()
----> 2 model.fit(X, y)
3 print(model.feature_importances_)
File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\ensemble\_forest.py:331, in BaseForest.fit(self, X, y, sample_weight)
329 if issparse(y):
330 raise ValueError("sparse multilabel-indicator for y is not supported.")
--> 331 X, y = self._validate_data(
332 X, y, multi_output=True, accept_sparse="csc", dtype=DTYPE
333 )
334 if sample_weight is not None:
335 sample_weight = _check_sample_weight(sample_weight, X)
File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\base.py:596, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
594 y = check_array(y, input_name="y", **check_y_params)
595 else:
--> 596 X, y = check_X_y(X, y, **check_params)
597 out = X, y
599 if not no_val_X and check_params.get("ensure_2d", True):
File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\utils\validation.py:1090, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
1070 raise ValueError(
1071 f"{estimator_name} requires y to be passed, but the target y is None"
1072 )
1074 X = check_array(
1075 X,
1076 accept_sparse=accept_sparse,
(...)
1087 input_name="X",
1088 )
-> 1090 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric, estimator=estimator)
1092 check_consistent_length(X, y)
1094 return X, y
File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\utils\validation.py:1100, in _check_y(y, multi_output, y_numeric, estimator)
1098 """Isolated part of check_X_y dedicated to y validation"""
1099 if multi_output:
-> 1100 y = check_array(
1101 y,
1102 accept_sparse="csr",
1103 force_all_finite=True,
1104 ensure_2d=False,
1105 dtype=None,
1106 input_name="y",
1107 estimator=estimator,
1108 )
1109 else:
1110 estimator_name = _check_estimator_name(estimator)
File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\utils\validation.py:899, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
893 raise ValueError(
894 "Found array with dim %d. %s expected <= 2."
895 % (array.ndim, estimator_name)
896 )
898 if force_all_finite:
--> 899 _assert_all_finite(
900 array,
901 input_name=input_name,
902 estimator_name=estimator_name,
903 allow_nan=force_all_finite == "allow-nan",
904 )
906 if ensure_min_samples > 0:
907 n_samples = _num_samples(array)
File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\utils\validation.py:146, in _assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name)
124 if (
125 not allow_nan
126 and estimator_name
(...)
130 # Improve the error message on how to handle missing values in
131 # scikit-learn.
132 msg_err += (
133 f"\n{estimator_name} does not accept missing values"
134 " encoded as NaN natively. For supervised learning, you might want"
(...)
144 "#estimators-that-handle-nan-values"
145 )
--> 146 raise ValueError(msg_err)
148 # for object dtype data, we only check for NaNs (GH-13254)
149 elif X.dtype == np.dtype("object") and not allow_nan:
ValueError: Input y contains NaN.
I have changed machine_ status to numeric values using the following subroutine.df2["machine_status"] = df2["machine_status"].map( lambda x: 0 if x == "NORMAL" else 1 if x == "BROKEN" else 2 if x == "BROKEN" else np.NaNNow machine status is either normal, recovering or broken. But I combine Recovering with Broken. ...
I think I know the answer to this so please be patient and I will check if my idea works.