Solved – Training error and logistic regression

logisticscoring-rules

Suppose we have a training data set with variables $a,b$ and $c$ and binary outcome variable $y$. We fit a logistic regression model to this data set:

$$\text{logit}(p) = \hat{\beta_0}+ \hat{\beta_{1}}a + \hat{\beta_{2}}b + \hat{\beta_{3}}c$$

When we get the predicted probabilities from the training data set using the logistic regression model, are we supposed to get perfect classification? Or this depend on the threshold we use?

Best Answer

Proportion classified correctly is an improper scoring rule, i.e., it is optimized by selecting the wrong features and estimating the wrong coefficients. It is arbitrary and problematic to use any thresholds for continuous quantities.