regression – Why High MSE/MAE/MAPE Values Occur When R2 Score is Very Good

accuracymachine learningmaer-squaredregression

I am applying different regression models (RF, Knn, etc) on some well-known datasets (bike sharing, diabetics, etc). The value of R2 is very good. From the R2 score, we can say that the model is working well (though this is not true for every case). So, I have MSE, MAE, and MAPE methods. But, the value of MAE/MAPE/MSE is very high which means that the prediction of the models is very bad and very far from the actual values (true labels).

The accuracy scores of the datasets

Name        MAE      MAPE      R2     MSE  
Bike        24.56    0.34      0.95   1615
Diabetics   0.06     2321.20   0.87   0.03  

The formula used to calculate MAPE

MAPE = np.mean(np.abs(predictions - y_test) / (y_test + 1e-5))

I would like to know, when the value of R2 value is good (very high), at the same time how it could possible that the prediction from the model is very bad (that we can get from the MSE/MAPE/MAE scores)

The description of datasets

Name     Count Mean    Std     Min    Max
Bike     17379 189.46  181.38  1.00   977
Diabetics 768   0.34    0.47    0      1

Best Answer

I don’t see how you tell from those metrics that the results are “very bad”. Compare the metrics to things like mean, range, or standard deviations, in all the cases MSE or RMSE (square root of MSE) is much smaller than the variability of the data.

The metrics don’t have an absolute numeric value, so you need some kind of benchmark for them. The most trivial model minimizing squared error is predicting mean for all the samples, in such case, RMSE would be equal to standard deviation, your model is better than this. For MAE the trivial model would be predicting median, with MAE equal to MAD, my guess is that you’re still better. For a less trivial model, you can compare the results to something like linear regression.

The only exception is MAPE, which for the second dataset is very high, but the dataset has zeros in it, and in such case, you should not use MAPE as a metric because whatever you divide by a value close to zero, it would be extremely high and destroy the metric. For example, say that the true value is 0 and you predict mean for it:

> abs(0.34 - 0) / (0 + 1e-5)
[1] 34000

See What are the shortcomings of the Mean Absolute Percentage Error (MAPE)? for more details, but MAPE is a tricky metric that should not be used blindly.

Related Question