MATLAB: RMSE – Different when calculated “by hand” compared to non-linear fitting.

fitnlmnon linear fittingnonlinearrmseroot mean squared error

I've been calculating the RMSE between "A" and "B" as follows:

RMSE1=sqrt(mean((A-B).^2))

Where A is my "experimental data" and B is my predicted data, obtained from fitting.

I started using the function fitnlm for the fitting to try a non-linear model. And found that the RMSE that is calculated by the function is different than the one I calculate "by hand" as follows:

modelfun = @(Coeff,Vars)f(Vars);
beta0=[Initial Coefficients starting point];
mdl=fitnlm(Vars,A,modelfun,beta0)
Coeff=table2array(mdl.Coefficients(:,1));
B=modelfun(Coeff,Vars);
RMSE=sqrt(mean((A-B).^2))

I was just wondering if anyone had any insight as to why this might be?

Thank you for your time.

P.S. I have also tried the following but get the same answer as RMSE1:

RMSE2=sqrt((sum((A-B).^2)/(size(A,1))))

Best Answer

It has been a while since I looked at this, although I recall answering the same question before on Answers, perhaps more than once.

I'd check to see if the difference is just in terms of what you divide by. If you are using mean versus summing, and then dividing by the number of elements, then OF COURSE YOU GET THE SAME RESULT! That is what mean does, so why would you expect something different?

But suppose you have two different models, with a different number of parameters? To compare them using RMSE, it is more valid to subtract off the number of degrees of freedom that were estimated to get a valid comparison. So something like this:

RMSE = sqrt((sum((A-B).^2)/(ndata - nparam));

If you look online, you will in fact find that some people just divide by the number of data points in there. So effectively, they do it as

RMSE = sqrt(mean((A-B).^2));

Really, the difference is not enough to worry about. Why do I say this?

Lack of fit in a model is often a significant part of the residuals. It may well be larger than the noise in many models. So if you are trying to use RMSE in any true statistical sense, you may be making a serious error anyway. The difference of a few percent in an estimate of RMSE is not going to matter that much.

For example, suppose you have 25 data points, and 3 parameters to estimate.

1/sqrt(25 -3)
ans =
          0.21320071635561
1/sqrt(25)
ans =
                       0.2

So one estimate of RMSE will be roughly 6.6% higher than another. As long as you consistently use the same version of RMSE estimator each time, it is simply not worth worrying about. If you might make a serious decision based on a 6.6% difference in an estimate of RMSE, then you are doing something wrong!

Related Solutions

MATLAB: Confusion about the representation of Root Mean Square, R Squared …

Residuals and measures related to them are not a percentage. In the context of a one-dimensional situation, residuals are analogous to deviations from the mean, and measures derived from them are roughly analogous to the variance or standard deviation. (With heavy emphasis on ‘roughly’.)

The Coefficient of Determination (R-Squared) value could be thought of as a decimal fraction (though not a percentage), in a very loose sense. From the documentation:

Coefficient of determination (R-squared) indicates the proportionate amount of variation in the response variable y explained by the independent variables X in the linear regression model. The larger the R-squared is, the more variability is explained by the linear regression model.

So the higher the R-Squared value, the better the fit of the model to the data.

MATLAB: RMSE

One way is to compute it yourself. You just need to compute the root of the mean of the squared errors (hence the name):

y = <true values>
yhat = <fitted values>
RMSE = sqrt(mean((y - yhat).^2));

Best Answer

Related Solutions

MATLAB: Confusion about the representation of Root Mean Square, R Squared …

MATLAB: RMSE

Related Question