MATLAB: Diffrence between RMSE selfcalculated and RMSE calculated with Statistics Toolbox

rmseStatistics and Machine Learning Toolbox

Hi all, i calculated the RMSE of these Data:
Y_hat=[
9.774614325191857
9.453084986417043
9.502166049524247
7.817755496590051
7.031233831915310
8.392026077578970
6.881255539731626
6.488927374899896
6.779374282307657
6.474790314047517
13.842988631876649
13.113764172190285
14.244292841981128
12.470726075747763]
Y=[
8.900000000000000
8.600000000000000
9.167000000000000
7.000000000000000
7.030000000000000
7.270000000000000
7.430000000000000
7.270000000000000
7.370000000000000
7.030000000000000
15.029999999999999
13.170000000000000
13.369999999999999
13.630000000000001]
my calculation is based on the Formula: RMSE= sqrt(mean((Y_hat-Y)^2)). with the calculation i got RMSE=0.7894.
But with the Statistics Toolbox of matlab I got RMSE=0.885 which is the sqrt of my calculated Value!!!. Who is wrong: I or the Toolbox??
Thank you!

Best Answer

You calculated the RMSE incorrectly -- and then had a remarkable numerical coincidence.
You calculated
RMSE = sqrt(mean(((Y-Y_hat).^2)))
which is equivalent to
RMSE = sqrt(sum(((Y-Y_hat).^2)/N_obs))
where N_obs is the number of observations. (N_obs = 14 in your case.) You got the value RMSE = 0.7847.
But the correct calculation of RMSE divides by the number of degrees of freedom, not the number of observations. The correct RMSE calculation is
RMSE = sqrt(sum(((Y-Y_hat).^2)/(N_obs-rankX)))
where rankX = 3 in your case.
So,
RMSE = sqrt(sum(((Y-Y_hat).^2)/11))
and is equal to 0.8853 (as MATLAB got).
The numerical coincidence, and complete red herring, is that this is very nearly equal to the square root of your incorrect value.
You can see where (the latest version of) MATLAB does the calculation of MSE around lines 1436-1440 of the file LinearModel.