Feature Scaling with Partitions - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Feature Scaling with Partitions (/thread-28959.html) |
Feature Scaling with Partitions - rocketfish - Aug-11-2020 I would like to apply normalization to a column in a Pandas DataFrame. However, I would like partition the table into shop_id values and apply separate normalization of item_cnt_day within each shop_id. Here's the dataset link if you're interested. Does anyone know a method to achieve this result? Custom code is welcome! Thanks. rocketfish RE: Feature Scaling with Partitions - scidam - Aug-12-2020 Something like the following should work, df['item_cnt_day'] = df.groupby('shop_id')['item_cnt_day'].transform(lambda x: (x-x.mean())/x.std()) However, if the group (a set of records with the same shop_id value) consist of one element only, this per-group scalingwill yield to NaN value (since x.std()= 0 if x is an array consisting of only one element). RE: Feature Scaling with Partitions - rocketfish - Aug-12-2020 @scidam - Thank you for your quick response! This is exactly the elegant solution I was hoping for. RE: Feature Scaling with Partitions - rocketfish - Aug-13-2020 For any others who are interested, here's a helpful explanation of how transform() works.https://pbpython.com/pandas_transform.html |