To start I'll just say that I do a lot of work with Python but I'm venturing into new territory with math/data plotting so bear with me. My dataset includes 4 columns - person, x, y coordinates and a binary response to those coordinates. With this data I'm looking to do a few different things.
- Return a probability value for each set of x,y coordinates
- Create some sort of graph (heatmap/density?) that will show the the likelihood of 0/1 for areas of the graph
- Evaluate subsets of the data using the 'person' column
Based on the research I've done sklearn.linear_model LogisticRegression seems to be the best way to go about this (have also toyed with pyGAM). As my script shows the furthest I've gotten is running the "predict_proba" function on the dataset but either I'm doing something wrong elsewhere or I just don't know how to intepret the results because they seem way off. If anyone can help me with this I would really appreciate it.
- Return a probability value for each set of x,y coordinates
- Create some sort of graph (heatmap/density?) that will show the the likelihood of 0/1 for areas of the graph
- Evaluate subsets of the data using the 'person' column
Based on the research I've done sklearn.linear_model LogisticRegression seems to be the best way to go about this (have also toyed with pyGAM). As my script shows the furthest I've gotten is running the "predict_proba" function on the dataset but either I'm doing something wrong elsewhere or I just don't know how to intepret the results because they seem way off. If anyone can help me with this I would really appreciate it.
data_df = frame[['person','x_value','y_value','binary_result']] #Create a scatter plot of the x,y coordinates with regard to their binary result fig = plt.figure(figsize=(4,4)) ax = fig.add_subplot(1, 1, 1) bin_res = [0,1] bin_col = ['r','g'] for res,col in zip(bin_res,bin_col): plot_df = data_df[(data_df['binary_result'] == res)] ax.scatter(plot_df['x_value'], plot_df['y_value'], c=col, marker='.') plt.show() #Execute logistic regression on the dataset x = data_df[['x_value','y_value']] y = data_df[['binary_result']] log_reg = linear_model.LogisticRegression(solver='lbfgs').fit(x,np.ravel(y)) predictions = log_reg.predict(x) predict_a = log_reg.predict_proba(x) print(predict_a)*FYI I've also asked this question on StackOverflow