Time Series – Effective Variable Selection Techniques in Time Series Data

feature selectionlassoregressiontime seriesunivariate

I have an econometric dataset, 50 observations of 350 variables.
They include things like GDP, unemployment, interest rates and their transformation such as YoY change, log transform, first differences etc. I need to build an arimax model, and first I need to select variables.

350 univariate regressions against the response were run, and the 20 best predictor variables based on R-square were chosen.

My question is: is univariate regression a good way to screen predictor variables?
I have read that variables perform differently in the when combined with others than alone. Is there anything I need to check about my data before pruning my set of predictor variables this way? ( My response variable is a log return (whose mean is close to zero), the transformed predictor variables vary in scale: some in log scale , others range in 100,000s. I expect most of the transformed ones to be stationary. )

Also, I tried running a Lasso selection in SAS with all the variables, and Lasso terminated in just 1 step selecting one variable only. There was a message whichi said that only 5 records out of the 50 observations were used by Lasso. Could this be due to missing values? My data doesn't have too many missings, so I was surprised. Maybe its because there are far many more predictors than observations (350 vs 50 ).

Thanks for any advice on how to proceed.

Best Answer

Your approach fails to consider various forms of delayed response to one or more of the candidate predictors. When determining the appropriate sub-set of variables you need to pre-whiten the variables and form impulse response weights to identify important lags of each of the candidates while taking into account possible variables like pulses/level shifts etc.. We refer to this problem as kitchen-sink modelling as you are throwing everything into the mix except the kitchen sink.