Python Forum
Filter and lambda question - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Filter and lambda question (/thread-3072.html)



Filter and lambda question - smw10c - Apr-27-2017

Below I am using a filter with a lambda expression. Up until now, in my course, the "g" in lambda referred to a specific cell in a data frame. For instance, by_company.apply(lambda x: x*2). How does it recognize that it's supposed to be looking at the 'Units' column if g refers to a specific cell? I didn't know you could filter on a cell to obtain a column. I assume this may relate to my previous question about groupby objects.

by_company = sales.groupby('Company')
by_company.filter(lambda g:g['Units'].sum() > 35)



RE: Filter and lambda question - nilamo - Apr-27-2017

http://pandas.pydata.org/pandas-docs/stable/groupby.html#filtration

When filtering grouped data (in pandas, which I'm guessing you're using), the function you pass filter is passed the entire group, not a particular cell. So yes, you can filter out certain groups based on one of the cells within that group.

...I guess. I've never actually used pandas.


RE: Filter and lambda question - smw10c - Apr-27-2017

(Apr-27-2017, 04:10 PM)nilamo Wrote: http://pandas.pydata.org/pandas-docs/stable/groupby.html#filtration When filtering grouped data (in pandas, which I'm guessing you're using), the function you pass filter is passed the entire group, not a particular cell. So yes, you can filter out certain groups based on one of the cells within that group. ...I guess. I've never actually used pandas.

Thank you!


RE: Filter and lambda question - zivoni - Apr-27-2017

As nilamo said, filter "filters" entire subsets of original dataframe. It is similar to your previous groups attribute of a groupby questions - for every unique value company of Company column your lambda gets the dataframe sales[sales.Company==company] as an argument, then selects and sums Units column and either keeps entire group (if sum is bigger than 35) or "drop" it. So your result is a dataframe with all companies that sold more than 35 units...

Btw, df.apply() applies function to entire row or column, applying function to a "cell" holds only if you use apply with pandas serie. Quite often thats interchangeable - if you want to apply function on column of a dataframe, you can use either (trivial example)
sales.apply(lambda g: 2 * g['Units'], axis=1)  # lambda function used on entire row, selects "column"
or
sales.Units.apply(lambda g: 2 *g)  # lambda function used on "cell" - column was already selected
Usually second form is preferred.