Python Forum
help for Kaggle Titanic Set fill the missing Age by median age of Pclass and Sex
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
help for Kaggle Titanic Set fill the missing Age by median age of Pclass and Sex
#2
You are probably looking for this:

train_df.Age.fillna(train_df.groupby(['Sex','Pclass]).transform('median').Age, inplace=True)
# from now train_df.Age doesn't contain nans

I would suggest you to take into account 'title' property, e.g. Masters are young people, etc.
Another suggestion is to use combined dataset (from train and test ones) to get 'median' estimations, i.e.
something like this
train_df.Age.fillna(pd.concat([train_df, test_df]).groupby(['Sex','Pclass']).transform('median').Age.iloc[:train_df.shape[0]], inplace=True)
Reply


Messages In This Thread
RE: help for Kaggle Titanic Set fill the missing Age by median age of Pclass and Sex - by scidam - Nov-21-2018, 01:30 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  How to generate rows based on values in a column to fill missing values codesmatter 1 2,854 Oct-31-2020, 12:05 AM
Last Post: Larz60+
  titanic from Seaborn matador 3 6,176 Aug-20-2020, 12:13 PM
Last Post: buran
  importing zip file on kaggle?? GuJu 4 5,534 Mar-10-2019, 02:21 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020