Feature Scaling with Partitions

Feature Scaling with Partitions - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Feature Scaling with Partitions (/thread-28959.html)

Feature Scaling with Partitions - rocketfish - Aug-11-2020

I would like to apply normalization to a column in a Pandas DataFrame. However, I would like partition the table into shop_id values and apply separate normalization of item_cnt_day within each shop_id. Here's the dataset link if you're interested.

Does anyone know a method to achieve this result? Wall

Custom code is welcome! Thanks.

rocketfish

RE: Feature Scaling with Partitions - scidam - Aug-12-2020

Something like the following should work,

df['item_cnt_day'] = df.groupby('shop_id')['item_cnt_day'].transform(lambda x: (x-x.mean())/x.std())

However, if the group (a set of records with the same shop_id value) consist of one element only, this per-group scaling
will yield to NaN value (since x.std()= 0 if x is an array consisting of only one element).

RE: Feature Scaling with Partitions - rocketfish - Aug-12-2020

@scidam - Thank you for your quick response! This is exactly the elegant solution I was hoping for.

RE: Feature Scaling with Partitions - rocketfish - Aug-13-2020

For any others who are interested, here's a helpful explanation of how transform() works.

https://pbpython.com/pandas_transform.html