Python Forum
help for Kaggle Titanic Set fill the missing Age by median age of Pclass and Sex
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
help for Kaggle Titanic Set fill the missing Age by median age of Pclass and Sex
#1
Hello All,
I am new to python programming and I am trying to solve the Titanic data set from Kaggle for self-learning.
The columns of train_df are ['PassengerId' 'Survived' 'Pclass' 'Name' 'Sex' 'Age' 'SibSp' 'Parch'
'Ticket' 'Fare' 'Cabin' 'Embarked']

Out of these only Age has some missing values. I have found out the median of the ages based on the Passenger class and Sex and stored as temp_df
temp_df=train_df[['Pclass', 'Sex','Age']].groupby(['Pclass','Sex']).median().reset_index()

Pclass Sex Age
1 female 35
1 male 40
2 female 28
2 male 30
3 female 21.5
3 male 25

I have tried many ways but not able to write a python code to update the missing Age values in train_df when the criteria match.
Can you please help me with a python code for the above bottleneck.
Thank you in advance for your time and reply.

Regards,
Parth
Reply
#2
You are probably looking for this:

train_df.Age.fillna(train_df.groupby(['Sex','Pclass]).transform('median').Age, inplace=True)
# from now train_df.Age doesn't contain nans

I would suggest you to take into account 'title' property, e.g. Masters are young people, etc.
Another suggestion is to use combined dataset (from train and test ones) to get 'median' estimations, i.e.
something like this
train_df.Age.fillna(pd.concat([train_df, test_df]).groupby(['Sex','Pclass']).transform('median').Age.iloc[:train_df.shape[0]], inplace=True)
Reply
#3
(Nov-21-2018, 01:30 AM)scidam Wrote: You are probably looking for this:

train_df.Age.fillna(train_df.groupby(['Sex','Pclass]).transform('median').Age, inplace=True)
train_df.Age.fillna(pd.concat([train_df, test_df]).groupby(['Sex','Pclass']).transform('median').Age.iloc[:train_df.shape[0]], inplace=True)
Dance Dance Dance Dance Thank You
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to generate rows based on values in a column to fill missing values codesmatter 1 2,139 Oct-31-2020, 12:05 AM
Last Post: Larz60+
  titanic from Seaborn matador 3 4,665 Aug-20-2020, 12:13 PM
Last Post: buran
  importing zip file on kaggle?? GuJu 4 4,077 Mar-10-2019, 02:21 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020