I am trying to understand how probability works with a threshold for logistic regression.
-
I understand the basics of how to calculate probability.
log odds = intercept+value1*coef1 odds = exp(log odds) prob = odds / (1+odds)
-
I understand that a threshold is used to find the optimal mix of correct predictions (precision, f1, etc.).
However, how do we interpret a probability in light of a threshold? For example, if a threshold is 0.195
, and a user has a probability of 0.0975
are they:
- 50% likely to respond (1) since they are 50% towards the threshold?
- Or are they still 0.0975% likely to respond (1), irrespective of how we consider the fact that anyone who is more then 0.195% likely is going to respond (1)?
Best Answer
They are predicted to have a $0.0975$ probability of responding $1$. The threshold you have chosen has no effect on the probability, only what you do with the predicted probability later. I should note that these are just the model's estimates, they need not be the true probabilities of responding $1$—models can certainly be wrong!
As a final note, I need to point out that positing a threshold and calling all observations above it, $1$, is not generally a good thing to do. There is more information in the predicted probability than in the attempted classification.