Forecasting – Comparing Results Using Brier Score and Logarithmic Scoring Rule

forecastingmodel comparisonscoring-rules

Let $X_i\sim B(\pi_i), \text{for }i=1,2,\cdots,n$. I have two models and I want to compare which of them forecast better.

Model 1: Estimates the parameters with maximum likelihood.

Model 2: Estimates the parameters with Bayes.

I use the Brier score and the Logarithmic scoring rule for comparison. The results are:

>> Model 1: 0.2505 (Brier), 0.6350 (minus log-score)
>> Model 2: 0.2544 (Brier), 0.6028 (minus log-score)

The smaller the score, the better the model. So, according to Brier Score Model 1 is better, and according to log-score Model 2 is better.

I would like to ask, why there is this difference. Also, is there a paper for learning how to compare the forecasting ability of a frequentist model with a Bayesian one?

Best Answer

Without getting into the Bayesian vs. frequentist part of your question, the two proper accuracy scores are rewarding different things and it's not surprising they behave differently. The logarithmic score rewards more extreme predictions that are in the right direction. This score can be ruined by a single prediction of probability of 0 or 1 that is in the wrong direction, due to taking the log of zero. The logarithmic rule is a rescaling of the gold standard optimization criteria (in the absence of other knowledge that Bayesians would use in the prior distribution) the log likelihood so in a sense it is the best accuracy score to use for binary $Y$.