Scoring Rules – Choosing Among Proper Scoring Rules for Model Selection

classificationmachine learningmathematical-statisticsmodel selectionscoring-rules

Most resources on proper scoring rules mention a number of different scoring rules like log-loss, Brier score or spherical scoring. However, they often don't give much guidance on the differences between them. (Exhibit A: Wikipedia.)

Picking the model that maximizes the logarithmic score corresponds to picking the maximum-likelihood model, which seems like a good argument for using logarithmic scoring. Are there similar justifications for Brier or spherical scoring, or other scoring rules? Why would someone use one of these rather than logarithmic scoring?

Best Answer

Why would someone use one of these rather than logarithmic scoring?

So ideally, we always distinguish fitting a model from making a decision. In Bayesian methodology, model scoring & selection should always be done using the marginal likelihood. You then use the model to make probabilistic predictions, and your loss function tells you how to act on those predictions.

Unfortunately in the real world, computational performance often dictates that we conflate the model-selection and the decision-making and so use a loss function to fit our models. This is where subjectivity in model selection creeps in, because you've got to guess just how much different kinds of mistake will cost you. The classic example is a diagnostic for cancer: overestimating someone's probability of cancer is not good, but underestimating it is much worse.

As an aside, if you're looking for guidance on how to pick a scoring rule, you might also want to look for guidance on picking a loss function or designing a utility function, as I think the literature on those two topics is a lot more voluminous.