Python Forum
Filter and lambda question
Thread Rating:
  • 1 Vote(s) - 4 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Filter and lambda question
#1
Below I am using a filter with a lambda expression. Up until now, in my course, the "g" in lambda referred to a specific cell in a data frame. For instance, by_company.apply(lambda x: x*2). How does it recognize that it's supposed to be looking at the 'Units' column if g refers to a specific cell? I didn't know you could filter on a cell to obtain a column. I assume this may relate to my previous question about groupby objects.

by_company = sales.groupby('Company')
by_company.filter(lambda g:g['Units'].sum() > 35)
Reply
#2
http://pandas.pydata.org/pandas-docs/sta...filtration

When filtering grouped data (in pandas, which I'm guessing you're using), the function you pass filter is passed the entire group, not a particular cell. So yes, you can filter out certain groups based on one of the cells within that group.

...I guess. I've never actually used pandas.
Reply
#3
(Apr-27-2017, 04:10 PM)nilamo Wrote: http://pandas.pydata.org/pandas-docs/sta...filtration When filtering grouped data (in pandas, which I'm guessing you're using), the function you pass filter is passed the entire group, not a particular cell. So yes, you can filter out certain groups based on one of the cells within that group. ...I guess. I've never actually used pandas.

Thank you!
Reply
#4
As nilamo said, filter "filters" entire subsets of original dataframe. It is similar to your previous groups attribute of a groupby questions - for every unique value company of Company column your lambda gets the dataframe sales[sales.Company==company] as an argument, then selects and sums Units column and either keeps entire group (if sum is bigger than 35) or "drop" it. So your result is a dataframe with all companies that sold more than 35 units...

Btw, df.apply() applies function to entire row or column, applying function to a "cell" holds only if you use apply with pandas serie. Quite often thats interchangeable - if you want to apply function on column of a dataframe, you can use either (trivial example)
sales.apply(lambda g: 2 * g['Units'], axis=1)  # lambda function used on entire row, selects "column"
or
sales.Units.apply(lambda g: 2 *g)  # lambda function used on "cell" - column was already selected
Usually second form is preferred.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Newbie question for using map, lambda zydjohn 2 3,376 Dec-09-2017, 07:18 PM
Last Post: zydjohn

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020