Solved – How to choose probability to predict success in logistic regression

cross-validationlogisticpredictive-models

I'm working through a logistic regression example from the lab on logistic regression in Intro to Statistical Learning. When they try to test how accurate their model is they do,

glm.pred[glm.probs >.5] = "Up"

Essentially they are asking whether the predicted probability of a market increase is greater than or less than 0.5. But how did they choose the number 0.5? If there is another situation where probabilities are much lower for each prediciton, do we replace 0.5 with the mean(glm.probs)?

Best Answer

It depends on the problem you are trying to solve. There are four important rates that one should look at for a problem: True Positive Rate, True Negative Rate, False Positive Rate and False Negative Rate. Changing the probability threshold will typically change each of these rates.

For example, let us consider the email spam detection problem. It may be more important that you get all your legitimate emails and are willing to accept a few emails that are spam. So, if your logistic regression model is predicting the probability of spam then you may want to set the probability higher than 0.5, say 0.9. This means that you want to classify an email as spam if your model's prediction of spam is at least 0.9.

Related Question