Solved – How tonterpret RMSLE (Root Mean Squared Logarithmic Error)

interpretationmachine learningmathematical-statisticsmeasurement errorregression

I've been doing a machine learning competition where they use RMSLE (Root Mean Squared Logarithmic Error) to evaluate the performance predicting the sale price of a category of equipment. The problem is I'm not sure how to interpret the success of my final result.

For example if I achieved a RMSLE of $1.052$ could I raise it the the exponential power $e$ and interpret it like rmse? (ie. $e^{1.052}=2.863=RMSE$)?

Could I then say that my predictions were $\pm \$2.863$ on average from the the actual prices? Or is there a better way to interpret the metric? Or can the metric even be interpreted at all with the exception of comparing to the other RMSLEs of other models?

Best Answer

I haven't seen RMSLE before, but I'm assuming it's $\sqrt{ \frac{1}{N} \sum_{i=1}^N (\log(x_i) - \log(y_i))^2 }$.

Thus exponentiating it won't give you RMSE, it'll give you

$e^\sqrt{ \frac{1}{N} \sum_{i=1}^N (\log(x_i) - \log(y_i))^2 } \ne \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - y_i)^2}$.

If we take the log of both sides, we get the RMSLE versus $\frac{1}{2} \log \left( \frac{1}{N} \sum_{i=1}^N (x_i - y_i)^2 \right)$, which is clearly not the same thing.

Unfortunately, there isn't a good easy relationship in general (though someone smarter than me / thinking about it harder than me could probably use Jensen's inequality to figure out some relationship between the two).

It is, of course, the RMSE of the log-transformed variable, for what that's worth. If you want a rough sense of the spread of the distribution, you can instead get a rough sense of the spread of their logarithm, so that a RMSLE of 1.052 means that the "average" is $2.86$ times as big as the true value, or 1/2.86. Of course that's not quite what RMSE means....