Nov-21-2018, 12:32 AM
Hello All,
I am new to python programming and I am trying to solve the Titanic data set from Kaggle for self-learning.
The columns of train_df are ['PassengerId' 'Survived' 'Pclass' 'Name' 'Sex' 'Age' 'SibSp' 'Parch'
'Ticket' 'Fare' 'Cabin' 'Embarked']
Out of these only Age has some missing values. I have found out the median of the ages based on the Passenger class and Sex and stored as temp_df
temp_df=train_df[['Pclass', 'Sex','Age']].groupby(['Pclass','Sex']).median().reset_index()
Pclass Sex Age
1 female 35
1 male 40
2 female 28
2 male 30
3 female 21.5
3 male 25
I have tried many ways but not able to write a python code to update the missing Age values in train_df when the criteria match.
Can you please help me with a python code for the above bottleneck.
Thank you in advance for your time and reply.
Regards,
Parth
I am new to python programming and I am trying to solve the Titanic data set from Kaggle for self-learning.
The columns of train_df are ['PassengerId' 'Survived' 'Pclass' 'Name' 'Sex' 'Age' 'SibSp' 'Parch'
'Ticket' 'Fare' 'Cabin' 'Embarked']
Out of these only Age has some missing values. I have found out the median of the ages based on the Passenger class and Sex and stored as temp_df
temp_df=train_df[['Pclass', 'Sex','Age']].groupby(['Pclass','Sex']).median().reset_index()
Pclass Sex Age
1 female 35
1 male 40
2 female 28
2 male 30
3 female 21.5
3 male 25
I have tried many ways but not able to write a python code to update the missing Age values in train_df when the criteria match.
Can you please help me with a python code for the above bottleneck.
Thank you in advance for your time and reply.
Regards,
Parth