Join dataframes... So simple but I can't work it out! - snakes - Oct-26-2021
I have a dataframe which contains NaNs. I've copied the rows with NaNs and used a DecisionTreeRegressor to predict the missing data. Now, I want to join the predictions dataframe to the original dataframe but the index of each dataframe is getting in the way. Please can someone help me join the two dataframes.
Index 13 joins fine but after that its back to the NaNs! I want to preserve the 13,14,48,50... index so I can join it back into the dataframe that has 0-12, 15-47 etc.
# Use DecisionTreeRegressor to predict gender NaNs
predictions = tree_reg.predict(nans_test) Output: array([1., 1., 1., 2., 1., 1., 2., 1., 2., 2., 1., 2., 2., 2.])
# Convert to dataframe
gender_predictions = pd.DataFrame(predictions, columns=['gender']) Output: gender
0 1.0
1 1.0
2 1.0
3 2.0
4 1.0
5 1.0
6 2.0
7 1.0
8 2.0
9 2.0
10 1.0
11 2.0
12 2.0
13 2.0
# Join predictions to nans_test
nans_test = nans_test.join(gender_predictions) Output: age usage_meeting_place usage_worship usage_arts usage_wellbeing usage_connections usage_model_sustainability usage_flexible_community/church usage_services usage_festivals usage_reflection modify_exterior modify_interior modify_sustainable_building modify_layout modify_cafe gender
13 2 4 2 4 4 4 2 2 2 2 2 3 3 2 4 4 2.0
14 1 4 4 4 4 4 1 4 1 1 0 2 3 4 3 0 NaN
48 1 3 3 3 3 3 2 3 3 2 3 3 3 2 3 3 NaN
50 2 3 3 3 3 3 3 3 3 1 3 2 3 3 3 3 NaN
55 1 2 1 1 1 3 1 2 0 0 0 3 2 2 3 1 NaN
61 2 4 2 3 4 4 4 4 2 2 1 3 3 4 3 3 NaN
71 2 3 3 3 3 3 3 4 3 1 2 3 4 4 3 4 NaN
73 2 4 2 4 4 4 4 3 2 2 4 3 2 4 3 4 NaN
83 1 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 NaN
84 2 4 2 4 3 4 4 3 2 2 3 2 2 3 3 3 NaN
103 2 4 4 4 4 4 4 4 4 2 2 2 3 4 2 3 NaN
106 1 3 3 2 3 3 2 4 3 3 2 2 2 2 3 2 NaN
149 1 3 4 3 3 3 3 3 4 2 3 2 2 4 1 3 NaN
162 0 4 2 2 4 3 3 3 3 3 3 1 2 3 1 3 NaN
Thank you!
RE: Join dataframes... So simple but I can't work it out! - snakes - Oct-27-2021
Given it was only 14 rows long I just added the index numbers to the line converting the array to a dataframe...
# Convert to dataframe
gender_predictions = pd.DataFrame(predictions, columns=['gender'], index=[13,14,48,50,55,61,71,73,83,84,103,106,149,162]) My next issue (!)...
Join/merge/concat the two dataframes. I'm going round in circles chasing for an answer I don't know exists!
I just want to 'merge' nans_train and nans_test_tr so that the index/ID is sorted in order. Here is a section that shows 13 and 14 missing...
Output: nans_train
ID age gender usage_meeting_place usage_worship usage_arts usage_wellbeing usage_connections usage_model_sustainability usage_flexible_community/church usage_services usage_festivals usage_reflection modify_exterior modify_interior modify_sustainable_building modify_layout modify_cafe
0 3 1 2.0 3 3 2 1 3 4 4 2 3 1 3 4 2 3 1
1 4 2 1.0 4 4 4 4 4 3 4 4 4 3 4 4 4 4 4
2 5 2 1.0 4 3 4 4 4 4 4 4 4 3 4 4 4 4 4
3 6 2 2.0 4 2 4 4 4 4 4 2 2 3 2 3 4 4 4
4 7 2 1.0 4 3 4 1 1 0 4 2 2 2 4 4 4 4 0
5 8 2 2.0 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
6 9 1 2.0 3 2 3 3 3 3 2 2 1 1 3 3 4 3 3
7 10 1 2.0 4 2 2 3 4 2 2 2 2 4 3 2 2 2 4
8 11 1 1.0 4 0 3 3 4 3 2 2 3 2 3 4 3 4 4
9 12 3 1.0 4 4 4 4 4 4 4 4 4 4 3 3 3 4 4
10 13 2 1.0 4 4 4 3 3 3 3 3 4 2 4 4 4 4 4
11 14 3 1.0 4 4 4 4 4 4 4 4 4 4 2 3 3 2 3
12 15 2 2.0 4 3 3 4 4 4 4 3 3 3 3 4 4 3 4
15 18 2 2.0 4 4 4 4 4 4
Output: nans_test_tr
ID age usage_meeting_place usage_worship usage_arts usage_wellbeing usage_connections usage_model_sustainability usage_flexible_community/church usage_services usage_festivals usage_reflection modify_exterior modify_interior modify_sustainable_building modify_layout modify_cafe gender
13 16 2 4 2 4 4 4 2 2 2 2 2 3 3 2 4 4 2.0
14 17 1 4 4 4 4 4 1 4 1 1 0 2 3 4 3 0 1.0
|