Regression – Understanding Low MAE, MSE, RMSE Scores and Negative R² in Regression Models

maemser-squaredrandom forestregression

I am using a random forest regression model to make predictions and leave one out cross validation for my prediction task. I am having a difficult time understanding why my R2 score is negative when the MSE, RMSE, and MAE are all very low. Here I am providing a sample of my true and predicted values:

True Value: 0.0511350891441389, Predicted Value: 0.1570743965948912
True Value: 0.1019683613090206, Predicted Value: 0.06101801962025982
True Value: 0.0722484077136202, Predicted Value: 0.12989937556879136
True Value: 0.8151465997429149, Predicted Value: 0.11910986913415476
True Value: 0.0141580461529044, Predicted Value: 0.10300264949635973
True Value: 0.0759365903712855, Predicted Value: 0.2007470535994329
True Value: 0.0168830791575889, Predicted Value: 0.0867039544973983
True Value: 0.0280480358233258, Predicted Value: 0.3334096609357363
True Value: 0.0119374073771543, Predicted Value: 0.0456333839555339
True Value: 0.0879195861169952, Predicted Value: 0.12158770472179008
True Value: 0.1877777777777777, Predicted Value: 0.1636636091524143
True Value: 0.1319864052287581, Predicted Value: 0.05390845919789602

These are the scores:

Mean Squared Error (MSE): 0.035323866926619006
Mean Absolute Error (MAE): 0.1288933724806987
Root Mean Squared Error (RMSE): 0.1879464469646048
R-squared (R2) Score: -0.4162881141285679

I am also providing my visualization of the actual vs. predicted values:

Actual v/s Predicted value plot

Best Answer

If you're getting $R^2<0$, then I assume you're using the equation below, which is the calculation used by sklearn.metrics.r2_score.

$$ R^2=1-\left(\dfrac{ \overset{N}{\underset{i=1}{\sum}}\left( y_i-\hat y_i \right)^2 }{ \overset{N}{\underset{i=1}{\sum}}\left( y_i-\bar y \right)^2 }\right) =1-\left(\dfrac{ N\times RMSE^2 }{ \overset{N}{\underset{i=1}{\sum}}\left( y_i-\bar y \right)^2 }\right) $$

This $R^2$ is a function of both the (R)MSE and the total sum of squares. Therefore, no matter how small that (R)MSE is, if the denominator is smaller than the numerator, you will get $R^2<0$.

The interpretation of $R^2<0$ is that your predictions have a higher MSE than that of a naïve model that always predicts the overall mean, $\bar y$. That is, the predictions are not very good. Based on the graph, that appears to be the case, likely driven by that point at the far right that your model misses badly.