Solved – Classification accuracy based on probability

accuracyclassificationprobability

Let's say we have a simple binary classification problem. So for a predictor X we want to predict response Y. Y is binary, so either 0 or 1. Now let's say we use two different classifiers, model1 and model2. While predicting a new data point x_i, model1 predicts with a probability of 0.9 that y_i = 1, while model2 says with a probability of 0.6 that y_i = 1. So if in reality y_i = 0, both models result in the same wrong label. This means that normal stats such as overall accuracy, kappa etc. will be the same for both models. Yet intuitively I feel like model1 is less accurate since it was more sure about its wrong prediction.

Are there some other classifier performance metrics that actually take this into account? It makes little sense to me that whether a prediction is 0.51 or 1 does not change classifier performance as long as the labels stay the same.

Best Answer

Classifier metrics that compare the predicted probabilities to the true classes go by the name of proper scoring rules. The two most popular are the log-loss

$$ L = \sum_i y_i \log(p_i) + (1 - y_i) \log(1 - p_i) $$

and the brier score

$$ L = \sum_i (y_i - p_i)^2 $$

The log-loss is used more in practice, as it is the log likelihood of the Bernoulli distribution.

It is good practice to fit and compare models using proper scoring rules, as this ensures your predicted probabilities are fit well and calibrated to the data. Once you have a well fit probability model, it can be used to answer a multitude of questions that cannot be answered with only class assignments.

Additionally, the AUC is a popular metric. It is not a proper scoring rule, but it can be used to evaluate any probabilistic classifier in terms of an average performance across a range of hard classification thresholds. The AUC is the probability that a randomly chosen true positive class receives a greater predicted probability than a randomly chosen negative class.

Related Question