Python Forum

Full Version: Filter and lambda question
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Below I am using a filter with a lambda expression. Up until now, in my course, the "g" in lambda referred to a specific cell in a data frame. For instance, by_company.apply(lambda x: x*2). How does it recognize that it's supposed to be looking at the 'Units' column if g refers to a specific cell? I didn't know you could filter on a cell to obtain a column. I assume this may relate to my previous question about groupby objects.

by_company = sales.groupby('Company')
by_company.filter(lambda g:g['Units'].sum() > 35)
http://pandas.pydata.org/pandas-docs/sta...filtration

When filtering grouped data (in pandas, which I'm guessing you're using), the function you pass filter is passed the entire group, not a particular cell. So yes, you can filter out certain groups based on one of the cells within that group.

...I guess. I've never actually used pandas.
(Apr-27-2017, 04:10 PM)nilamo Wrote: [ -> ]http://pandas.pydata.org/pandas-docs/sta...filtration When filtering grouped data (in pandas, which I'm guessing you're using), the function you pass filter is passed the entire group, not a particular cell. So yes, you can filter out certain groups based on one of the cells within that group. ...I guess. I've never actually used pandas.

Thank you!
As nilamo said, filter "filters" entire subsets of original dataframe. It is similar to your previous groups attribute of a groupby questions - for every unique value company of Company column your lambda gets the dataframe sales[sales.Company==company] as an argument, then selects and sums Units column and either keeps entire group (if sum is bigger than 35) or "drop" it. So your result is a dataframe with all companies that sold more than 35 units...

Btw, df.apply() applies function to entire row or column, applying function to a "cell" holds only if you use apply with pandas serie. Quite often thats interchangeable - if you want to apply function on column of a dataframe, you can use either (trivial example)
sales.apply(lambda g: 2 * g['Units'], axis=1)  # lambda function used on entire row, selects "column"
or
sales.Units.apply(lambda g: 2 *g)  # lambda function used on "cell" - column was already selected
Usually second form is preferred.