Solved – compare models from linear regression and nonlinear regression using RMSE

rregression

I am comparing multiple published equation forms, refit with independent data. I'm trying to be true to the original authors' methods as much as possible. Therefore, I have 3 linear equations (fit in R using lm()), two of which use transformed Y-variables, and one equation fit using nonlinear regression (fit in R using the gnls() function).

In all instances cases I'm weighting the residual variance structure using the inverse of one of the predictors to account for observed heteroskedasticity.

I have been evaluating the models using R2, and RMSE- using back-transformed data for the two models with transformations.

I've calculated RMSE "by hand" using the following equation:

 RMSE<-sqrt(sum(residuals(Equation)^2)/length(residuals(Equation))-2))

Should I use similar code to calculate RMSE for the linear and nonlinear regression models? Is the metric still a valid statistic for comparison, or am I missing some important assumption?

Edited: I initially stated that I was also comparing models using AIC; I later recalled that AIC would not be appropriate if the Y-variables were transformed because the models would be estimating different things.

Best Answer

  • RMSE is certainly appropriate also for nonlinear models
  • However, the RMSE expressions I know actually calculate the mean, so no -2 (looks like d.f. for linear model? - d.f. for nonlinear models would be different!)
  • In general, I'd not use the residuals for calculating RMSE but rather use independent test cases to avoid an optimistic bias.
Related Question