[Math] On normalized error measures

data analysismean square errorstatistics

I have function values $f_1,\ldots,f_n$ that are approximated by data $y_1,\ldots,y_n$. I am looking for a measure that describes the error in the data $y_1,\ldots,y_n$ and I want the measure to take values between $0$ and $1$.

I am familiar with the Root Mean Squared Error (RMSE) or RMSD (D for deviation):
$$\mathrm{RMSE}=\sqrt{\frac{1}{n}\sum_{i=1}^n{\left(f_i-y_i\right)^2}} $$
Since this not normalized, I started searching for a normalized version, on which Wikipedia says:

Normalizing the RMSE facilitates the comparison between datasets or models with different scales. Though there is no consistent means of normalization in the literature, the range of the measured data defined as the maximum value minus the minimum value is a common choice:
$$\mathrm{NRMSE}=\frac{\mathrm{RMSE}}{y_{\mathrm{max}}-y_{\mathrm{min}}}$$

Not that I consider Wikipedia a trustworthy source, but seems to me this isn't normalizing, since this could even blow up the error measure (if $y_{\mathrm{max}}-y_{\mathrm{min}}$ is small).

Another approach would be to not consider the absolute error, but the error as a percentage. Something of the form
$$\frac{1}{n}\sum_{i=1}^n{\left|\frac{f_i-y_i}{f_i}\right|}\ \ \ \mbox{ or } \ \ \ \frac{1}{n}\sqrt{\sum_{i=1}^n{\left(\frac{f_i-y_i}{f_i}\right)^2}}.$$
But this again does not necessarily take values between $0$ and $1$.

Now my question is: are there other (perhaps better) measures to describe a normalized error. In other words, given values $f_1,\ldots,f_n$ and approximations $y_1,\ldots,y_n$, is there a measure to describe the error in these approximations that always takes values between $0$ and $1$?

Best Answer

I would suggest you to consider the so-called "symmetric" mean absolute percentage error (MAPE), which is defined as

$$\frac{1}{n}\sum_{i=1}^n{\left|\frac{f_i-y_i}{(f_i+y_i)/2}\right|}$$

and where the errors are weighted by the average of the reference and the observed values. In contrast with the classical MAPE (usually defined as "regular"), which ranges between $0$ and $\infty$, the symmetric MAPE expressed in percentage ranges between $0$ and $200\%$. This measure is somewhat linked to the commonly used Bland-Altman analysis. Also, it has the interesting advantage of avoiding the "asymmetric" nature of the regular MAPE, and it is less sensitive to outliers.

If you take the half of this measure (which corresponds to the same formula above, but without the division by $2$ in the denominator) and express it as a decimal number, you get a simple and reliable measure of error that ranges between $0$ and $1$, with $0$ meaning perfect agreement and $1$ maximal (theoretical) disagreement. Also consider that, in the large majority of scenarios and applications, the symmetric MAPE values tend to fall in the lower range, i.e. rather near to $0$.

Related Question