I have a question about how to use cross-validation to select probability threshold for logistic regression. Suppose I want to minimize the misclassification rate. Say, I use 5-fold CV, and is this procedure correct:
1.fit 5 logistic regression models using each 4-folds of the data.
2.for each probability threshold(e.g. from 0.01 to 0.99), apply the 5 models on the left 1-fold of data, get misclassification rate. Then average these 5 error rates.
3.the optimal probability threshold is the one with smallest misclassification rate.
And suppose I fit a ridge logistic regression model, to select the tuning parameter $\lambda$, is it okay to first use CV to select an optimal $\lambda$(e.g. use cv.glmnet function in R package glmnet), then apply this parameter to the procedure above to find probability threshold?
Best Answer
Yes, I think you have the right idea.
Just to put it another way, I'd say:
Fix the hyper-parameters that you don't want to search for (e.g. as you mention, this could be a regularization strength $\lambda$)
Choose a set of hyper-parameters $\theta$ you want to "optimize" by CV and split your dataset into folds (note it is standard practice to first remove a portion of your data as a testing set, and then use the remaining part for CV).
Fix a search space (e.g. $\theta_i\in[a_i,b_i]\;\forall\; i$) and then for each $\hat{\theta}$ in the space, learn the model $f(x;\hat{\theta})$ and compute the average error $\mathcal{E}(f)$ over the CV folds. Keep the $f$ with the least error.