Solved – How to determine the optimal threshold to achieve the highest accuracy

optimizationthreshold

I have a list of probabilities outputted by a classifier on a balanced dataset. The metric I want to maximize is accuracy ($\frac{TP+TN}{P+N}$). Is there a way to calculate the best threshold (without iterating over many threshold values an selecting the best one), given the probabilities and their true labels.

Best Answer

I suspect that the answer is "no", i.e., that there is no such way.

Here is an illustration, where we plot the predicted probabilities against the true labels:

accuracy

Since the denominator $P+N$ in the formula for accuracy does not change, what you are trying to do is to shift the horizontal red line up or down (the height being the threshold you are interested in) in order to maximize the number of "positive" dots above the line plus the number of "negative" dots below the line. Where this optimal line lies depends entirely on the shape of the two point clouds, i.e., the conditional distribution of the predicted probabilities per true label.

Your best bet is likely a bisection search.

That said, I recommend you look at