Regression Metrics – Best Metrics for Cases with Very Large Values and Zero Values

customer-lifetime-valuemodel-evaluationregression

I'm working on a CLTV problem, where the objective is to predict the future spending of the customers, given their past behaviour. According to arXiv:1912.07753, paragraph 4 EVALUATION METRICS, I'm measuring calibration (difference between actual and predicted values) and discrimination (ranking of the user by CLV).

I'm having a hard time finding a good metric for calibration because values can be extremely large for some customers (making squared errors based on metrics such as R-square and MSE less meaningful ) or exactly zero (making percentage metrics such as MAPE impossible to compute).

The only good metric I could find is the MAE, but it doesn't allow us to compare with other results, as the MAE is very dataset-specific.

What metrics would you recommend for regression problems with 0 and very large values?

Best Answer

It depends on what functional of the future distribution you want to elicit.

Put differently, future outcomes follow some probability distribution (which, judging from your description, may be heavy-tailed and/or zero-inflated), and the point forecast you want to evaluate is a "one number summary" of this distribution. This holds even if you do not explicitly look at the distribution - it will always be there and lurking under the surface.

The issue is that different error measures elicit different one number summaries from the underlying distribution. The MSE is minimized in expectation by the expectation of the distribution. The MAE is minimized by its median. (That the MSE is more strongly influenced by the tail of the distribution than the MAE is just another way of saying that the expectation of the distribution is more strongly influenced by the tail than the median.) A quantile loss will be optimized by the appropriate quantile.

One consequence is that different point forecasts will be optimal for different error measures. Another one is that you should remember that your OLS regression will likely optimize the MSE as an objective function, so it does not really make sense to evaluate forecasts from an OLS model using the MAPE. (The MAE makes sense if you believe in symmetric errors, which again does not seem to be the case here.)

So the question should first be what functional you are interested in, and only after you have given this some thought should you pick an appropriate error measure. Which functional solves your problem, in turn, depends on what you want to do with the point forecast afterwards.

More information can be found at What are the shortcomings of the Mean Absolute Percentage Error (MAPE)?, at Why use a certain measure of forecast error (e.g. MAD) as opposed to another (e.g. MSE)? and in Kolassa (2020).

Related Solutions

Solved – Mean Squared Error changes according to scale of value in machine learning regression problem

When I implement a simple linear regression model using scikit learn in Python, I get the MSE to be about 2.037727147668752e-07. However I noticed if I multiplied all my features and the value to be predicted by say 100, the MSE changed to 0.0024.

When you multiply your training data by 100, then your predictions will also change by a factor of (about) 100. The MSE is the mean of the squared differences between actuals and predictions. If you scale both actuals and (roughly) predictions by a factor of 100, the difference is also scaled by 100, so the square of the difference is scaled by 10,000. It works out. The features don't have anything to do with this effect.

If the MSE is a metric that is to be used on a relative scale, how do I interpret it? Does it mean an error of 0.002 means that if my actual value is 0.008, my predicted value is 0.008 +/- 0.002 = 0.006 or 0.01?

The MSE is not a relative measure. It is just the mean of the squared errors. Yes, this is hard to interpret. You may want to look at Mean absolute error OR root mean squared error?

This is a large difference between the actual and predicted values, are there any specific regression machine learning models that
work well for this kind of problem? Will normalizing the data or scaling it help improve performance and if so, why?

Scaling and normalizing will usually not help (except that scaling will scale the MSE, as above, but that is not helpful). Without knowing much more about your data, the best we can do is suggest How to know that your machine learning problem is hopeless?

I noticed that MAE remained constant regardless of the scale. Is this an absolute error measure?

This should not happen. The MAE is the mean of absolute errors. Scaling the actuals (and therefore also the predictions) should scale the MAE by the same amount.

What other metric can I use to evaluate the performance of my model?

This may be helpful - it's written in the context of time series forecasting, but you can apply it in other contexts, too.

regression – Why High MSE/MAE/MAPE Values Occur When R2 Score is Very Good

I don’t see how you tell from those metrics that the results are “very bad”. Compare the metrics to things like mean, range, or standard deviations, in all the cases MSE or RMSE (square root of MSE) is much smaller than the variability of the data.

The metrics don’t have an absolute numeric value, so you need some kind of benchmark for them. The most trivial model minimizing squared error is predicting mean for all the samples, in such case, RMSE would be equal to standard deviation, your model is better than this. For MAE the trivial model would be predicting median, with MAE equal to MAD, my guess is that you’re still better. For a less trivial model, you can compare the results to something like linear regression.

The only exception is MAPE, which for the second dataset is very high, but the dataset has zeros in it, and in such case, you should not use MAPE as a metric because whatever you divide by a value close to zero, it would be extremely high and destroy the metric. For example, say that the true value is 0 and you predict mean for it:

> abs(0.34 - 0) / (0 + 1e-5)
[1] 34000

See What are the shortcomings of the Mean Absolute Percentage Error (MAPE)? for more details, but MAPE is a tricky metric that should not be used blindly.

Best Answer

Related Solutions

Solved – Mean Squared Error changes according to scale of value in machine learning regression problem

regression – Why High MSE/MAE/MAPE Values Occur When R2 Score is Very Good

Related Question