Python Forum

Full Version: Feature Scaling with Partitions
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I would like to apply normalization to a column in a Pandas DataFrame. However, I would like partition the table into shop_id values and apply separate normalization of item_cnt_day within each shop_id. Here's the dataset link if you're interested.

Does anyone know a method to achieve this result? Wall Custom code is welcome! Thanks.

rocketfish
Something like the following should work,

df['item_cnt_day'] = df.groupby('shop_id')['item_cnt_day'].transform(lambda x: (x-x.mean())/x.std())

However, if the group (a set of records with the same shop_id value) consist of one element only, this per-group scaling
will yield to NaN value (since x.std()= 0 if x is an array consisting of only one element).
@scidam - Thank you for your quick response! This is exactly the elegant solution I was hoping for.
For any others who are interested, here's a helpful explanation of how transform() works.

https://pbpython.com/pandas_transform.html