Solved – Prediction interval for lasso regression with time series data

lassomachine learningmathematical-statisticsprediction intervaltime series

I am currently working with time series data. My objective is to predict the a certain value at time t given some other variables that we will know the same day ( but prior to our objective variable). After trying several models I have managed to obtain a relatively good prediction using lasso regression.

However, given the importance of the problem, I would like to have some kind of confidence interval for my predictions, it would be very important to understand how accurate my prediction would be given a certain probability.

One solution I have though about is using a certain number of past MAE to compute the standard deviation and with that, and assuming they errors have a normal distribution compute a confidence interval at 95% with +-2 s.d.

One important consideration is that my dependent variable does not behave the same through the years, it is not stationary.

Would this be a robust way of computing this intervals or are there better alternatives?

Best Answer

As suggested in the comments, I am turning my comment into an answer so the question will have an answer in the system.

Original:

Check RMSEP (Root Mean Standard Error of Prediction). This value tracks your model's ability to predict on out-of-sample values. You can calculate RMSEP over different time periods to determine how well you are predicting, which will give you a "standard error" of prediction which could be used in the calculation of a rough confidence interval.

I suppose a bit of an explanation of this answer is needed, since only a certain number of characters are allowed per comment. First, as @StephanKolassa noted in his comment, there is a difference between prediction intervals and confidence intervals. This is an important distinction. In RMSEP calculation, one is attempting to understand how well a model can predict, and can compare different models by relative magnitudes of RMSEPs. The RMSEP by itself may not be extremely useful (like $R^2$ for simple regression would be), but it can be enlightening for model comparison.

Additionally, building on @ChrisHaug's comment, you may want to look at the out-of-sample errors you use to calculate your RMSEP measure. They may provide you with an empirical distribution which could shed some light on how well your model is predicting. For example, if your errors have an extremely heavy tail, it may indicate that your model does not do a great job if you are looking to avoid extremely unlikely but costly tail-events, (like an asset manager attempting to avoid exposure to recession-level events).

Related Question