Solved – What to choose from BIC/AIC/ridge/elastic net

aicbicridge regression

I have the following regression problem

I have about 60 independent variables; some of them have a high correlation with others. I have around 3 million observations

(1) – My main goal is out-of-sample-prediction, so my main question is: which regularization method should I use in this case?

Some more questions (assumptions I have, probably a little confused)

(2) – Ridge regression, while not completely removing coefficients, would keep those coefficients low that lasso/elastic net/BIC would completely remove; is that correct? (If it doesn't, would that be a problem?)

(3) – If I wanted to use AIC/BIC in this case, I would have to test all possible combinations of the 60 independent variables?

(4) – Would it make sense to start with AIC/BIC, then later do ridge regression with the remaining independent variables? (I guess ridge regression after AIC/BIC might make sense because some of the independent variables correlate with others?)

Thanks

Best Answer

There seem to be two related issues here: 1) Overfitting and 2) Collinearity. As @fg said in a comment, with so many observations overfitting is not likely to be a real problem. However, collinearity may be.

High correlations among IVs is often a sign of problematic collinearity - that is, collinearity that can cause the model to be poorly estimated - but it is neither a necessary nor a sufficient condition for that. Since you are estimating a linear model I suggest getting the condition indices and proportion of variance explained matrix (you don't say if you are using R, SAS, SPSS or what, but this is available in all three of those and probably others). High condition indices (30 or so is a recommended threshold) that have associated high proportion of variance explained on two or more variables may cause problems.

An alternative to this (which also works well for nonlinear models), if you are using R, is the perturb package.

The main problem caused is that small changes in the input can cause large changes in the model (and, thus, possibly large changes in predictions).