Oct-13-2020, 01:55 PM
Hi, I am new to machine learning and I am working on predicting patient outcomes using their data in the chart. I am having a hard time with feature selection due to columns that have different units, and also have strings. Here a small sample of the data.
Age 85,89,79
Cough yes,no,yes
WBC 24,29,89
O2Sat 57%,90%,85%
All 4 of these are different. Most of my data is yes and no. So at first i tried binary coding, and changed all the yes and no's to 0 and 1's. This worked well, but would also choose features like age because the average was always high because it was not binary. I was using Scikit learn, but it only works with numeric values. Is there another program that i could use to help with my feature selection?
Age 85,89,79
Cough yes,no,yes
WBC 24,29,89
O2Sat 57%,90%,85%
All 4 of these are different. Most of my data is yes and no. So at first i tried binary coding, and changed all the yes and no's to 0 and 1's. This worked well, but would also choose features like age because the average was always high because it was not binary. I was using Scikit learn, but it only works with numeric values. Is there another program that i could use to help with my feature selection?