Solved – Variable selection in time-series forecasting

feature selectionforecastingpredictionpredictive-modelstime series

I have a time-series forecasting task and would like some input on variable selection and regularisation.

My problem has the following characteristics:

  • 2,000,000 sample size.
  • Most of the time, no change in the response. Change in the response happens 0.3% of the time and is what I'm interested in (don't talk about asymmetric loss functions and the like – I'm purely interested in variable selection/regularlisation).
  • I will be predicting direction (up/down/no change).
  • 100 predictor variables. Mostly hand engineered from complex non-stationary data. All carry some predictive power – the useless ones with no univariate forecasting power were eliminated early in the process.
  • There will be clusters of highly correlated predictors – in general if you select two predictors at random the correlation will not be high.
  • Predictors are heteroscedastic in nature.
  • Some predictors are heavily serially correlated, many are not.
  • Predictors all evolve continuously and are decimal processes. They are all bell shaped.
  • Some could be non-stationary, but are thought to carry critical information about the evolution of the response process that would be lost upon differencing or similar $I(0)$ transforms.
  • My forecasting model will be linear, but least squares loss function is not certain (perpendicular regression and quantile regression and discrete response regression will be tried).
  • Standard error unbiasedness and variance is not a direct consideration. All I care about is out of sample forecast accuracy and model generalisation.

I know I can use anything from non-negative garotte, ridge, lasso, elastic nets, random subspace learning, PCA/manifold learning, least angle regression and various yucky hacks to select based on out of sample forecast performance. But is there anything specific to forecasting, or to the characteristics of my data, that would push me one way over another (time efficiency is important – trying everything and selecting the best is not practical).

Note that decision boundaries on the forecasts are not intended to be part of the regularisation algorithm, but integration could be possible.

I will also be using a supercomputer.

Best Answer

Such type of question is not really suitable for this site, since there can be no one correct answer to such a question. Here are my observations.

  1. Your response variable is censored, so linear model may be not the best model, since clearly it will not produce the exact zeroes. I would look into some sort of censored regression, like tobit regression.

  2. Since you want a regression model, forecasting the future values of the response variable involves forecasting the future values of the predictor variables. So when evaluating forecasting performance you should use the forecasts of predictor variables, not their out-of-sample values to get more reliable forecasting performance estimate.