Python Forum

Googling this problem it only seems to come up when using a classifier, though I'm trying to normalise a column, and it seems to need a 2d array, but I'm not sure how I would make a 2d array out of my single column:

wgt_l1 = normalize(data.wgt, norm='l1')
wgt_l2= normalize(data.wgt, norm='l2')

Considering my weighting ranges from over 400,000 to below 20,000 means I should probably normalise it, though I can't seem to get the method to run properly. Not as much the actual code, but can I ask why it is necessary to need a 2d array to normalise?

Error msg:

Hide/Show

Error:~\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in normalize(X, norm, axis, copy, return_norm)
   1552 
   1553     X = check_array(X, sparse_format, copy=copy,
-> 1554                     estimator='the normalize function', dtype=FLOAT_DTYPES)
   1555     if axis == 0:
   1556         X = X.T

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    550                     "Reshape your data either using array.reshape(-1, 1) if "
    551                     "your data has a single feature or array.reshape(1, -1) "
--> 552                     "if it contains a single sample.".format(array))
    553 
    554         # in the future np.flexible dtypes will be handled like object dtypes

ValueError: Expected 2D array, got 1D array instead:
array=[ 77516.  83311. 215646. ... 374983.  83891. 182148.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Please find a small sample of numbers (can change to whatever typing necessary, currently in float as the error message mentioned float but I'd rather it be int):

Hide/Show

Output:45137    242136.0
45138    112115.0
45139     77132.0
45140    117909.0
45141    229647.0
45142    149347.0
45143     23157.0
45144     93977.0
45145    159691.0
45146    176967.0
45147    344436.0
45148    430340.0

I would try to do something like:

data.wgt.reshape(-1,1)

But then I get the error:

Error:
AttributeError: 'Series' object has no attribute 'reshape'

data['wgt'] = MinMaxScaler().fit_transform(data['wgt'].values.reshape(-1,1))

Got it with this, though I still don't understand why it must be 2d?

As per the error message, data.wgt is a pandas.Series object. You can get the data in a numpy array, which has the 'reshape' method, by accessing the 'values' attribute.

data.wgt.values.reshape(-1, 1)

A quick look at the Pandas docs point towards using 'to_numpy()' instead

data.wgt.to_numpy().reshape(-1, 1)

brkolvr

boring_accountant