Solved – How to select cut-point for making classifications table for logistic regression

classificationlogistic

How can I select cut-points to convert predicted probabilities to predicted responses in order to make a classification table for logistic regression? Should I take different cut-points like .5, .6, .7, etc.? (If I take different values the prediction error rate varies for them.) How can I make a generalization when I am taking different cut-points and having different prediction error rates?

Best Answer

Setting these cut points should really be done in the context of some descicion making process. I'll give you an example that you might be able to generalize to your context.

Let's say I build a model that estimates the probability a person has event Y occur in the future, given their score on X today. I then estimate P(Y|X) in a new population, for whom Y hasn't occured yet. I can then set a cuttoff point in the predicted probability that says which people are at "high risk" of Y. Lacking any other information or context, this is completely arbitrary and not useful.

Now let's say I want to save money, and people that have Y occur cost me some of it. I have an intervention that prevents Y from occuring some of the time, but this intervention also costs me money.

Now I have a decision to make...and that is where the cuttoff I choose might have meaning. If X is very expensive, weighed against a very cheap and effective intervention, I might set my cuttoff very low, even intervening in people that would never have had the outcome occur anyway. Conversely, if X is cheap, and/or the intervention is expensive or doesn't work well, I might set the cut-off very high (or in the extreme do nothing at all). Basically you can define an equation that relates all of these things, and choose the cut-off that saves the most money.

It's also helpful to do this because you start to see how hard it is! Your estimate of P(Y|X) has uncertainty...and so do all your other parameters - how much does Y cost; how effective is the intervention; how much does the intervention cost? And perish the thought you want to optimize something harder to measure than money, like happiness. This is when you will really see how useful your model is, or isn't.