Solved – Difference between confidence interval and RMSE

confidence intervalgaussian processpartial least squaresrms

For a long time I have been doing chemometrics using Partial Least Squares regression to predict the composition of samples based on their spectra, and calculating the root mean squared error (RMSE) as an estimate of the accuracy of the model.

I am now looking into using gaussian process regression to do something similar. As a Bayesian method, GPR results automatically come with a confidence interval, but I could also calculate an RMSE just as I do with PLS.

So, my question is: If I want to report the accuracy of a prediction using GPR, do I report the RMSE or the confidence interval (or both)?

Best Answer

By , you probably mean a . The two terms are often confused, but yes, there is a difference.

Point predictions and prediction intervals offer different kinds of information.

  • Point predictions predict a single number. You can assess their quality using , or a lot of other accuracy KPIs, like the , the or even the . Which KPI is most meaningful really depends on your loss function - the penalizes large errors much more than the .

    If your Bayesian method gives you an entire posterior distribution, you can extract a point forecast from it in various ways, e.g., by taking the mean, or the median, or the mode. Fun fact: if you use the or the , take the mean - and if you use the , take the median. There won't be much of a difference if the posterior distribution is symmetric, but if it's asymmetric, this can indeed make a difference.

  • Prediction intervals give interval-valued predictions, which aim at covering the actual value, say, 80% or 90% of the time if you run "many" predictions. The simplest way to assess the quality of a would be to count whether they indeed cover your prespecified proportion of actuals. Maybe test the resulting contingency table or run a binomial test if you want to see whether the deviation from your prespecified coverage is statistically significant.

Whether you actually want a point prediction or a prediction interval is up to what you want to do with your predictions. Point predictions are easier for non-statisticians to grasp. PIs allow you to do scenario analyses; they tell you how sure you are about a prediction (if they are well calibrated).