Solved – Machine Learning – Prediction Interval – Cheating

boostingprediction intervalpredictive-models

I work at a company that is trying to use machine learning methods in particular gradient boosting and neural networks to make predictions on stock market data, so using historical data to predict what the price of a stock/asset will be $x$ time periods from the present. We are using these methods for regression as opposed to classification, and it is my habit being trained in experimental sciences to always give a regression prediction in terms of a $\pm$, giving an interval of prediction, rather than just one number. My manager (who doesn't seem very technical) told me that this is unacceptable/cheating, as I'm using an interval to cover the fact that my algorithm is unable to produce a single correct number. I'm a bit confused about this attitude, as coming from laboratory sciences (chemistry) we always cite every result in terms of a $\pm$.

So, I was wondering what the stats experts on here thought? Just out of curiosity, I checked my Machine Learning textbook by Hastie, Witten, et. al., and they use the MSE on the test set to give a $\pm$ on the predictions from an example they use on gradient boosting, so it seems standard to do this…

Thanks.

Best Answer

In general, prediction intervals are considered better than point estimates. While it's great to have a good estimate for what a stock price will be tomorrow, it's much better to be able to be able to give a range of values that the stock price is very likely to be in.

That being said, it's generally more difficult to produce reliable prediction intervals than merely produce point estimates that have good prediction properties. For example, in many cases we can show that with non-constant variance, we can still produce a consistent estimator of the mean of the new value even if we ignore the non-constant variance issue. However, we definitely need a reliable estimate of the variance function to produce prediction intervals.

I've heard of people just treating this as another level of a machine learning problem: the first level is to produce a function $\hat f(x_i) = E[\hat y_i | x_i] $, the estimates of the values and the second level is to produce a function $\hat V(x_i) = E[(y_i - \hat y_i)^2 | x_i]$, the estimates of the variance of the function given the inputs. In theory, this should work (given enough data with a stable function), but in practice, it must be handled with a lot of care, as variance estimates are inherently much less stable than mean estimates. In short, you should expect to need much more data to accurately estimate $\hat V(x_i)$ than $\hat f(x_i)$.

So there's definitely nothing about prediction intervals that is "cheating" compared with just producing point estimates. It's just harder to do. As an empirical example, in the M4 forecasting competition, only 2 of the 15 methods that produced 95% prediction intervals had nearly correct coverage; most of the other prediction intervals had coverage in the 80-90% range (see slide 35 in the link).

Related Question