Running Standard Scaler in Python 3

Led_Zeppelin · (This post was last modified: Sep-05-2022, 05:57 PM by Led_Zeppelin.)

I am trying to get the following code to work:

df.head()

df2 = df.drop(["Unnamed: 0", "timestamp"], axis=1, inplace=True)

df3=pd.DataFrame(df2)

type(df2)

df3.head()

At this time, it fails when I try to put df3 thru a standard scaler. I will show the code"

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [91], in <cell line: 2>()
      1 scaler=StandardScaler()
----> 2 df4=scaler.fit_transform(df3)

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\base.py:867, in TransformerMixin.fit_transform(self, X, y, **fit_params)
    863 # non-optimized default implementation; override when a better
    864 # method is possible for a given clustering algorithm
    865 if y is None:
    866     # fit method of arity 1 (unsupervised transformation)
--> 867     return self.fit(X, **fit_params).transform(X)
    868 else:
    869     # fit method of arity 2 (supervised transformation)
    870     return self.fit(X, y, **fit_params).transform(X)

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\preprocessing\_data.py:809, in StandardScaler.fit(self, X, y, sample_weight)
    807 # Reset internal state before fitting
    808 self._reset()
--> 809 return self.partial_fit(X, y, sample_weight)

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\preprocessing\_data.py:844, in StandardScaler.partial_fit(self, X, y, sample_weight)
    812 """Online computation of mean and std on X for later scaling.
    813 
    814 All of X is processed as a single batch. This is intended for cases
   (...)
    841     Fitted scaler.
    842 """
    843 first_call = not hasattr(self, "n_samples_seen_")
--> 844 X = self._validate_data(
    845     X,
    846     accept_sparse=("csr", "csc"),
    847     dtype=FLOAT_DTYPES,
    848     force_all_finite="allow-nan",
    849     reset=first_call,
    850 )
    851 n_features = X.shape[1]
    853 if sample_weight is not None:

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\base.py:577, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
    575     raise ValueError("Validation should be done on X, y or both.")
    576 elif not no_val_X and no_val_y:
--> 577     X = check_array(X, input_name="X", **check_params)
    578     out = X
    579 elif no_val_X and not no_val_y:

File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\sklearn\utils\validation.py:768, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    764     pandas_requires_conversion = any(
    765         _pandas_dtype_needs_early_conversion(i) for i in dtypes_orig
    766     )
    767     if all(isinstance(dtype_iter, np.dtype) for dtype_iter in dtypes_orig):
--> 768         dtype_orig = np.result_type(*dtypes_orig)
    770 if dtype_numeric:
    771     if dtype_orig is not None and dtype_orig.kind == "O":
    772         # if input is object, convert to float.

File <__array_function__ internals>:180, in result_type(*args, **kwargs)

ValueError: at least one array or dtype is required

1
df.index

I am not sure what this error is talking about.

I just want to get df3 in a form that standardscaler can use. I think it can only accept dataframe so I convert it to a dataframe in the last step before sending it to standardscaler. Then I get the error. What am I doing wrong here. It seems okay.

Any help appreciated.

Respectfully,

LZ

**deanhystad** · (This post was last modified: Sep-05-2022, 09:33 PM by deanhystad.)

It is telling you that you cannot do this:

df3=pd.DataFrame(None)

Which is what you are doing because df2 == None. df2 == None because you cannot do this:

df2 = df.drop(["Unnamed: 0", "timestamp"], axis=1, inplace=True)

From the pandas documentation
https://pandas.pydata.org/pandas-docs/st....drop.html

Quote:inplacebool, default False
If False, return a copy. Otherwise, do operation inplace and return None.

So either you can do this:

df2 = df.drop(["Unnamed: 0", "timestamp"], axis=1)

Or you can do this:

df.drop(["Unnamed: 0", "timestamp"], axis=1, inplace=True)

Assignment of the result, and using "inplace=True" can never be used together

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	python standard way of importing library	mg24	1	1,595	Nov-15-2022, 01:41 AM Last Post: deanhystad
	Scaler fit with different colums	HoldYourBreath	0	1,851	Jan-10-2021, 11:53 AM Last Post: HoldYourBreath
	Winsorized Mean and Standard Deviation	Wheeliam	0	2,500	Jul-11-2020, 05:27 PM Last Post: Wheeliam
	standard library modules	chpyel	4	4,016	May-10-2020, 02:58 PM Last Post: snippsat
	Is there a standard for autocommit In PEP 249	zatlas1	10	7,586	Feb-06-2019, 04:56 PM Last Post: buran
	Graphics and standard deviation	rocioaraneda	3	3,783	Jan-09-2019, 10:53 PM Last Post: micseydel
	standard data types	rombertus	3	97,792	Dec-23-2018, 08:52 PM Last Post: rombertus
	Fatal Python error: init_sys_streams: can't initialize sys standard streams Attribute	FatalPythonError	24	67,672	Aug-22-2018, 06:10 PM Last Post: FatalPythonError
	Join the Python Standard Library to my project	sylas	1	2,776	May-16-2018, 05:59 AM Last Post: buran
	Do you know how to import Python Standard Library	sylas	30	19,334	Jan-26-2018, 01:32 PM Last Post: metulburr

Running Standard Scaler in Python 3

User Panel Messages

Announcements