To start I'll just say that I do a lot of work with Python but I'm venturing into new territory with math/data plotting so bear with me. My dataset includes 4 columns - person, x, y coordinates and a binary response to those coordinates. With this data I'm looking to do a few different things.
- Return a probability value for each set of x,y coordinates
- Create some sort of graph (heatmap/density?) that will show the the likelihood of 0/1 for areas of the graph
- Evaluate subsets of the data using the 'person' column
Based on the research I've done sklearn.linear_model LogisticRegression seems to be the best way to go about this (have also toyed with pyGAM). As my script shows the furthest I've gotten is running the "predict_proba" function on the dataset but either I'm doing something wrong elsewhere or I just don't know how to intepret the results because they seem way off. If anyone can help me with this I would really appreciate it.
*FYI I've also asked this question on StackOverflow
- Return a probability value for each set of x,y coordinates
- Create some sort of graph (heatmap/density?) that will show the the likelihood of 0/1 for areas of the graph
- Evaluate subsets of the data using the 'person' column
Based on the research I've done sklearn.linear_model LogisticRegression seems to be the best way to go about this (have also toyed with pyGAM). As my script shows the furthest I've gotten is running the "predict_proba" function on the dataset but either I'm doing something wrong elsewhere or I just don't know how to intepret the results because they seem way off. If anyone can help me with this I would really appreciate it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
data_df = frame[[ 'person' , 'x_value' , 'y_value' , 'binary_result' ]] #Create a scatter plot of the x,y coordinates with regard to their binary result fig = plt.figure(figsize = ( 4 , 4 )) ax = fig.add_subplot( 1 , 1 , 1 ) bin_res = [ 0 , 1 ] bin_col = [ 'r' , 'g' ] for res,col in zip (bin_res,bin_col): plot_df = data_df[(data_df[ 'binary_result' ] = = res)] ax.scatter(plot_df[ 'x_value' ], plot_df[ 'y_value' ], c = col, marker = '.' ) plt.show() #Execute logistic regression on the dataset x = data_df[[ 'x_value' , 'y_value' ]] y = data_df[[ 'binary_result' ]] log_reg = linear_model.LogisticRegression(solver = 'lbfgs' ).fit(x,np.ravel(y)) predictions = log_reg.predict(x) predict_a = log_reg.predict_proba(x) print (predict_a) |