help for Kaggle Titanic Set fill the missing Age by median age of Pclass and Sex

Parthasarathi009 · Nov-21-2018, 12:32 AM

Hello All,
I am new to python programming and I am trying to solve the Titanic data set from Kaggle for self-learning.
The columns of train_df are ['PassengerId' 'Survived' 'Pclass' 'Name' 'Sex' 'Age' 'SibSp' 'Parch'
'Ticket' 'Fare' 'Cabin' 'Embarked']

Out of these only Age has some missing values. I have found out the median of the ages based on the Passenger class and Sex and stored as temp_df
temp_df=train_df[['Pclass', 'Sex','Age']].groupby(['Pclass','Sex']).median().reset_index()

Pclass Sex Age
1 female 35
1 male 40
2 female 28
2 male 30
3 female 21.5
3 male 25

I have tried many ways but not able to write a python code to update the missing Age values in train_df when the criteria match.
Can you please help me with a python code for the above bottleneck.
Thank you in advance for your time and reply.

Regards,
Parth

**scidam** · (This post was last modified: Nov-21-2018, 01:31 AM by scidam.)

You are probably looking for this:

train_df.Age.fillna(train_df.groupby(['Sex','Pclass]).transform('median').Age, inplace=True)

# from now train_df.Age doesn't contain nans

I would suggest you to take into account 'title' property, e.g. Masters are young people, etc.
Another suggestion is to use combined dataset (from train and test ones) to get 'median' estimations, i.e.
something like this

train_df.Age.fillna(pd.concat([train_df, test_df]).groupby(['Sex','Pclass']).transform('median').Age.iloc[:train_df.shape[0]], inplace=True)

Parthasarathi009 · Nov-21-2018, 06:50 PM

(Nov-21-2018, 01:30 AM)scidam Wrote: You are probably looking for this:

train_df.Age.fillna(train_df.groupby(['Sex','Pclass]).transform('median').Age, inplace=True)

train_df.Age.fillna(pd.concat([train_df, test_df]).groupby(['Sex','Pclass']).transform('median').Age.iloc[:train_df.shape[0]], inplace=True)

Thank You

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How to generate rows based on values in a column to fill missing values	codesmatter	1	2,140	Oct-31-2020, 12:05 AM Last Post: Larz60+
	titanic from Seaborn	matador	3	4,665	Aug-20-2020, 12:13 PM Last Post: buran
	importing zip file on kaggle??	GuJu	4	4,078	Mar-10-2019, 02:21 PM Last Post: buran

help for Kaggle Titanic Set fill the missing Age by median age of Pclass and Sex

User Panel Messages

Announcements