Solved – Range of predicted probabilities by logistic regression

classificationlogisticprobability

I have a binary classification problem with unbalanced classes, e.g. I have 500 examples of negative class(0) and 20 examples of positive class (1) and I need to estimate the probability of positive class. I use logistic regression, which gives me extremely small probabilities so that optimal threshold by some criteria based on ROC is about 0.1. Is there any way to force (scale) this to be 0.5 or any other arbitrary number?
Thank you very much!

Best Answer

The intercept in a logistic regression model is the log odds of a "success" when all the predictor variables are at 0. This means that changing the intercept will change the "baseline" probability for a "success". One way to do this is to center your predictor variables around a meaningful value (the value at which you want to set the probability) then fit the model using the centered predictors. Change the intercept to the log odds of a success that you are interested in (0 for p=0.5). If you want the model to be back on the original scale (not centered), then it is just algebra to work out the correct intercept.

Just be careful interpreting/using/sharing this model.

Another option for classification is to use Linear Discriminant Analysis (lda) which under certain assumptions is very similar to logistic regression. The lda function in the MASS package has an argument for specifying a prior probability of class membership. This may be a more straight forward way of accomplishing what you are trying to do.