Running Standard Scaler in Python 3

Running Standard Scaler in Python 3 - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Running Standard Scaler in Python 3 (/thread-38116.html)

Running Standard Scaler in Python 3 - Led_Zeppelin - Sep-05-2022

I am trying to get the following code to work:

df.head()

df2 = df.drop(["Unnamed: 0", "timestamp"], axis=1, inplace=True)

df3=pd.DataFrame(df2)

type(df2)

df3.head()

At this time, it fails when I try to put df3 thru a standard scaler. I will show the code"

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [91], in <cell line: 2>()
      1 scaler=StandardScaler()
----> 2 df4=scaler.fit_transform(df3)

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\base.py:867, in TransformerMixin.fit_transform(self, X, y, **fit_params)
    863 # non-optimized default implementation; override when a better
    864 # method is possible for a given clustering algorithm
    865 if y is None:
    866     # fit method of arity 1 (unsupervised transformation)
--> 867     return self.fit(X, **fit_params).transform(X)
    868 else:
    869     # fit method of arity 2 (supervised transformation)
    870     return self.fit(X, y, **fit_params).transform(X)

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\preprocessing\_data.py:809, in StandardScaler.fit(self, X, y, sample_weight)
    807 # Reset internal state before fitting
    808 self._reset()
--> 809 return self.partial_fit(X, y, sample_weight)

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\preprocessing\_data.py:844, in StandardScaler.partial_fit(self, X, y, sample_weight)
    812 """Online computation of mean and std on X for later scaling.
    813 
    814 All of X is processed as a single batch. This is intended for cases
   (...)
    841     Fitted scaler.
    842 """
    843 first_call = not hasattr(self, "n_samples_seen_")
--> 844 X = self._validate_data(
    845     X,
    846     accept_sparse=("csr", "csc"),
    847     dtype=FLOAT_DTYPES,
    848     force_all_finite="allow-nan",
    849     reset=first_call,
    850 )
    851 n_features = X.shape[1]
    853 if sample_weight is not None:

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\base.py:577, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
    575     raise ValueError("Validation should be done on X, y or both.")
    576 elif not no_val_X and no_val_y:
--> 577     X = check_array(X, input_name="X", **check_params)
    578     out = X
    579 elif no_val_X and not no_val_y:

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\utils\validation.py:768, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    764     pandas_requires_conversion = any(
    765         _pandas_dtype_needs_early_conversion(i) for i in dtypes_orig
    766     )
    767     if all(isinstance(dtype_iter, np.dtype) for dtype_iter in dtypes_orig):
--> 768         dtype_orig = np.result_type(*dtypes_orig)
    770 if dtype_numeric:
    771     if dtype_orig is not None and dtype_orig.kind == "O":
    772         # if input is object, convert to float.

File <__array_function__ internals>:180, in result_type(*args, **kwargs)

ValueError: at least one array or dtype is required

1
df.index

I am not sure what this error is talking about.

I just want to get df3 in a form that standardscaler can use. I think it can only accept dataframe so I convert it to a dataframe in the last step before sending it to standardscaler. Then I get the error. What am I doing wrong here. It seems okay.

Any help appreciated.

Respectfully,

LZ

RE: Running Standard Scaler in Python 3 - deanhystad - Sep-05-2022

It is telling you that you cannot do this:

df3=pd.DataFrame(None)

Which is what you are doing because df2 == None. df2 == None because you cannot do this:

df2 = df.drop(["Unnamed: 0", "timestamp"], axis=1, inplace=True)

From the pandas documentation
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

Quote:inplacebool, default False
If False, return a copy. Otherwise, do operation inplace and return None.

So either you can do this:

df2 = df.drop(["Unnamed: 0", "timestamp"], axis=1)

Or you can do this:

df.drop(["Unnamed: 0", "timestamp"], axis=1, inplace=True)

Assignment of the result, and using "inplace=True" can never be used together