Solved – Does it make sense to talk about the standard deviation of RMSE

msermsstandard deviation

I want to plot the RMSE of some models to compare their performance in a dataset. I'd like also to include error bars, because I know they may have very different standard deviations.

The problem is, does it even make sense to talk about standard deviation of RMSE? I thought about taking the square root of the standard deviation of the MSE, but I don't know if that's what I need.

Best Answer

RMSE is the square root of MSE. But the answer to your question depends on if you are talking about the MSE of a predictor versus an estimator.

MSE of Estimator

MSE of an estimator is a fixed quantity, and has no variance. So it makes no sense to talk about the SD of the MSE.

Consider a special case with model, $M_1$, $Y_i = \beta$ for $n$ $iid$ observations. Suppose $E[Y_i]=\beta$ and $Var[Y_i] = \phi$, $\forall i$. An unbiased estimator is for $\beta$ is $\hat \beta = n^{-1} \sum_i Y_i$. Now $\hat \beta $ is a function of a random sample and so is random itself. If it's random, it has variance. Since it is unbiased, $MSE[\hat \beta]=Var[\hat\beta] = \phi$. So the MSE is constant.

$Var\big[MSE[\hat\beta]\big]=Var\big[\phi\big]=0$.

MSE of a Predictor

This is a function of a random sample so it is itself random and therefore has variance. Consider predictor from the model above, $M_1$

$MSE_{pred} = n^{-1} \sum_i(Y_i - \hat Y)^2 = n^{-1} \sum_i(Y_i - \hat \beta)^2 $

$\hat \beta$ is a function of data (random). $Y_i$ is the data (also random), so the whole MSE is a statistic - so is itself random. So it has variance and we can meaningfully talk about the SD of the MSE.