Solved – How to interpret root mean squared error (RMSE) vs standard deviation

rmsstandard deviationstandard error

Let's say I have a model that gives me projected values. I calculate RMSE of those values. And then the standard deviation of the actual values.

Does it make any sense to compare those two values (variances)? What I think is, if RMSE and standard deviation is similar/same then my model's error/variance is the same as what is actually going on. But if it doesn't even make sense to compare those values then this conclusion could be wrong. If my thought is true, then does that mean the model is as good as it can be because it can't attribute what's causing the variance? I think that last part is probably wrong or at least needs more information to answer.

Best Answer

Let's say that our responses are $y_1, \dots, y_n$ and our predicted values are $\hat y_1, \dots, \hat y_n$.

The sample variance (using $n$ rather than $n-1$ for simplicity) is $\frac{1}{n} \sum_{i=1}^n (y_i - \bar y)^2$ while the MSE is $\frac{1}{n} \sum_{i=1}^n (y_i - \hat y_i)^2$. Thus the sample variance gives how much the responses vary around the mean while the MSE gives how much the responses vary around our predictions. If we think of the overall mean $\bar y$ as being the simplest predictor that we'd ever consider, then by comparing the MSE to the sample variance of the responses we can see how much more variation we've explained with our model. This is exactly what the $R^2$ value does in linear regression.

Consider the following picture: The sample variance of the $y_i$ is the variability around the horizontal line. If we project all of the data onto the $Y$ axis we can see this. The MSE is the mean squared distance to the regression line, i.e. the variability around the regression line (i.e. the $\hat y_i$). So the variability measured by the sample variance is the averaged squared distance to the horizontal line, which we can see is substantially more than the average squared distance to the regression line.

Best Answer

Related Solutions

Regression – How to Calculate the Specific Standard Error Relevant for a Specific Point Estimate Within a Linear Regression

Solved – Does it make sense to talk about the standard deviation of RMSE

MSE of Estimator

MSE of a Predictor

Related Question