Python Forum
Thread Rating:
  • 1 Vote(s) - 2 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Apply Method Question
#1
I hope you are all having a good day. I have a question in regards to the apply function. Below is a function I created and then I apply it to my data frame:
def impute_age(cols):
    Age = cols[0]
    Pclass = cols[1]
    
    if pd.isnull(Age):

        if Pclass == 1:
            return 37

        elif Pclass == 2:
            return 29

        else:
            return 24

    else:
        return Age

train['Age'] = train[['Age','Pclass']].apply(impute_age, axis=1)
Why, when I don't specify axis=1 does the code not correctly replace the null values in the age column? I understand axis=1 means to apply it to the columns, however, I don't get the logic of how applying it to the rows (axis=0) doesn't work.

Moderator nilamo: Please use code tags in the future
Reply
#2
If you use df.apply(func), then func is applied to a columns of dataframe. So in your case impute_age is applied at first on entire column "Age", after that on entire column "Pclass" and returns series with two elements only, first one is based on first two values of "Age', second one is based on first two values of "Pclass".

You dont need to use .apply for this imputing - you can do it directly by assigning. Either one by one:
train.Age[train.Age.isnull() & (train.Pclass == 1)] = 37
train.Age[train.Age.isnull() & (train.Pclass == 2)] = 29
train.Age[train.Age.isnull() & train.Pclass.isnull()] = 24
or using something more complicated like nested np.where
train.Age[train.Age.isnull()] = np.where(train.Pclass==1, 37, np.where(train.Pclass==2, 29, 24))[train.Age.isnull()]
Reply
#3
(Apr-07-2017, 06:13 PM)zivoni Wrote: If you use df.apply(func), then func is applied to a columns of dataframe. So in your case impute_age is applied at first on entire column "Age", after that on entire column "Pclass" and returns series with two elements only, first one is based on first two values of "Age', second one is based on first two values of "Pclass". You dont need to use .apply for this imputing - you can do it directly by assigning. Either one by one:
 train.Age[train.Age.isnull() & (train.Pclass == 1)] = 37 train.Age[train.Age.isnull() & (train.Pclass == 2)] = 29 train.Age[train.Age.isnull() & train.Pclass.isnull()] = 24 
or using something more complicated like nested np.where
 train.Age[train.Age.isnull()] = np.where(train.Pclass==1, 37, np.where(train.Pclass==2, 29, 24))[train.Age.isnull()] 

Thank you for the response. However, I am still confused on what is happening if axis=0. Can you dumb it down for me please?
Reply
#4
df.apply(func, axis=0) is exactly same as df.apply(func) - default value for axis is 0. As i mentioned in previous post, in this case .apply aplies function func to entire columns. Simple example with func printing its argument and some information about it:
Output:
In [2]: df = pd.DataFrame({'a':[1,2,3], 'b':[5,6,7]}) In [3]: df Out[3]:    a  b 0  1  5 1  2  6 2  3  7 In [4]: def func(s):        ...:     print("=== type: {},  shape: {}".format(type(s), s.shape))    ...:     print(s)    ...:     In [5]: apply_result = df.apply(func) === type: <class 'pandas.core.series.Series'>,  shape: (3,) 0    1 1    2 2    3 Name: a, dtype: int64 === type: <class 'pandas.core.series.Series'>,  shape: (3,) 0    5 1    6 2    7 Name: b, dtype: int64 In [6]: apply_result Out[6]: a    None b    None dtype: object
As you can see, func is applied to the column "a" first, after that to the column "b". And result is a series with same index as column index for original dataframe, containing None's, as func has no return statement.
Reply
#5
(Apr-08-2017, 08:44 AM)zivoni Wrote: df.apply(func, axis=0) is exactly same as df.apply(func) - default value for axis is 0. As i mentioned in previous post, in this case .apply aplies function func to entire columns. Simple example with func printing its argument and some information about it:
Output:
In [2]: df = pd.DataFrame({'a':[1,2,3], 'b':[5,6,7]}) In [3]: df Out[3]: a b 0 1 5 1 2 6 2 3 7 In [4]: def func(s): ...: print("=== type: {}, shape: {}".format(type(s), s.shape)) ...: print(s) ...: In [5]: apply_result = df.apply(func) === type: <class 'pandas.core.series.Series'>, shape: (3,) 0 1 1 2 2 3 Name: a, dtype: int64 === type: <class 'pandas.core.series.Series'>, shape: (3,) 0 5 1 6 2 7 Name: b, dtype: int64 In [6]: apply_result Out[6]: a None b None dtype: object
As you can see, func is applied to the column "a" first, after that to the column "b". And result is a series with same index as column index for original dataframe, containing None's, as func has no return statement.

Thank you for your help.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  apply(pd.Series) until no more array mikisDeWitte 1 2,741 Apr-17-2021, 08:45 PM
Last Post: Caprone
  pd.query method question PolskaYBZ 5 3,058 Jan-25-2019, 08:23 PM
Last Post: stullis
  PyCharm IDE: Method Not Showing Up Question: Bug or Feature? Oliver 2 3,691 Dec-04-2017, 11:54 AM
Last Post: Oliver
  value_counts method question smw10c 4 4,819 Mar-22-2017, 10:59 PM
Last Post: zivoni

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020