Solved – the difference between an RMSE and RMSLE (logarithmic error)

errormeasurement errorrms

RMSE vs RMSLE

Root Mean Squared Error (RMSE) and Root Mean Squared Logarithmic Error (RMSLE) both are the techniques to find out the difference between the values predicted by the machine learning model and the actual values.

  • But, what is the purpose for RMSLE( "logarithmic")

  • Does a high RMSE imply low RMSLE?

Can somebody explain in-detailed differences between RMSE and RMSLE? And how the metric works under the hood?

  • When would one use RMSE over RMSLE?
  • What are the advantages/disadvantages of using RMSE over RMSLE?

Best Answer

RMSLE is an error metric that is sometimes used for prediction of random variables. If you have a vector of random variables $\mathbf{x} = (x_1,...,x_n)$ and you make the predictions $\hat{\mathbf{x}} = (\hat{x}_1,...,\hat{x}_n)$ then the RMSLE of these predictions is given by:

$$\begin{equation} \begin{aligned} \text{RMSLE} (\mathbf{x},\hat{\mathbf{x}}) &= \text{RMSE} (\log(\mathbf{x} + \mathbf{1}),\log(\hat{\mathbf{x}} + \mathbf{1})) \\[6pt] &= \sqrt{\frac{1}{n} \sum_{i=1}^n [\log (x_i+1) - \log (\hat{x}_i+1) ]^2} \\[6pt] &= \sqrt{\frac{1}{n} \sum_{i=1}^n \Big( \log \Big( \frac{x_i+1}{\hat{x}_i+1} \Big) \Big)^2 } \\[6pt] &= \sqrt{\frac{2}{n} \sum_{i=1}^n \Big| \log \Big( \frac{x_i+1}{\hat{x}_i+1} \Big) \Big| } \\[6pt] \end{aligned} \end{equation}$$

(Note that here I am using the notational convention of applying $\log$ element-wise to a vector.) As you can see, all this metric is really doing is to shift the true values and predictions onto a log-scale before computing the RMSE. This metric requires the values and predictions are all above negative one, though in practice it is usually used when both the true values and predictions are non-negative.

Related Question