Python Forum

Full Version: Unbalance Dataset - prediction model
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hey everyone.

I am working on personal project that can change the face of the restaurant industry.

Let’s make it simple. Dataset of 63k rows, 7 columns. 6 significant characteristics to me target value. 2 two target values ( Show or no show). For instance, I want to build a model that is predicting if a person will show or not show at a restaurant knowing some characteristics ( Type of guest, party size, visits completed, day, hours, month). However, I have 53k rows for reservations that are qualified “Done” against 6k rows for my no show. I built random forest and regression, giving me shit results. Why? How should I deal with that? I have something big, but my model… Any help would be appreciated!

I can forward you beginning of my data set which are encoded such as Day 1 = Lundi Hours 2= between 6 to 7 Month = 3 March Type of Client 3= Member Visits completed 4 Size = 5 meaning 5 people at the table