Solved – n alternative to R squared to compare goodness of fits of different datasets? Slope makes them incomparable

goodness of fitleast squaresr-squared

I'm fitting the degradation of a signal. Some instruments degrade faster than others, so the slope varies a bit. This makes it difficult to compare how good the fits are relative to eachother.

See example below. Two scatter plots with a comparable spread of data (shared x and y axes). Eye-balling the data I expect the left one will have a slightly worse fit, but I guess I did not expect R-squared to be 1.5 orders of magnitude different!

Similar plots, wildly different R-squared.

I'm starting to understanding that R-squared is also measure of X-Y correlation. So I guess for this case, where flat signals are possible, it's not very useful… or I've got the wrong approach altogether.

Would there be an alternative to compare the goodness of these fits? Something relative is ok. I'm thinking about if I could somehow set a boundary/range… but I'm just not familiar enough with the subject and I'd probably be just be badly reinventing the wheel 🙂

Thanks in advance for any advice!

Best Answer

You may want to consider a measure of accuracy that measures the distance between the line and the data. There are a variety of these types of measures, maybe including Mean Absolute Error, Mean Square Error, or Root Mean Square Error.

The following is an example in R. The amount of vertical error for model and model2 are the same, but model has zero slope and zero r-squared, while model2 has an obvious slope and a high r-squared. You can compare the MAE, MSE, or RMSE statistics. (Caveat: I am the author of the accuracy function.)

if(!require(rcompanion)){install.packages("rcompanion")}
library(rcompanion)

X = c(1,2,3,4,5,6,7,8,9,10)
Y = c(5,6,4,5,5,5,5,4,6,5)

model = lm(Y ~ X)

plot(Y ~ X)

accuracy(list(model), plotit=F)

    ### Min.max.accuracy MAE   MAPE MSE  RMSE NRMSE.mean NRMSE.median NRMSE.mean.accuracy NRMSE.median.accuracy Efron.r.squared CV.prcnt
    ###            0.927 0.4 0.0833 0.4 0.632      0.126        0.126               0.874                 0.874               0     12.6           

X = c(1,2,3,4,5,6,7,8,9,10)
Z = X + c(5,6,4,5,5,5,5,4,6,5)

model2 = lm(Z ~ X)

plot(Z ~ X)

accuracy(list(model2), plotit=F)

   ### Min.max.accuracy MAE   MAPE MSE  RMSE NRMSE.mean NRMSE.median NRMSE.mean.accuracy NRMSE.median.accuracy Efron.r.squared CV.prcnt
   ###            0.961 0.4 0.0418 0.4 0.632     0.0602       0.0602                0.94                  0.94           0.954     6.02