I am trying to do variable selection using elastic net (Matlab Lasso
function with alpha of 0.5). I have 75 predictors in total (some are correlated with each other, hence using elastic net instead of lasso), and I would like to get a subset of them, which are good predictors for my outcome.
So my question is: How can I calculate something like $R^2$ that show how much of my outcome is explained by these selected variables?
-
If I use the selected variables in a multiple linear regression model, is the $R^2$ gonna be valid, since my variables are correlated?
-
Can I calculate cross-validated $R^2$ (using leave-one-out) to get a more accurate $R^2$?
-
Is there any other way than calculating $R^2$ that I show my variable selection method predicts well?
Best Answer
Just use the regular $R^2$, i.e. the squared correlation between the fitted and the actual values. Whether the model was fit by OLS or by penalized OLS (such as the elastic net), it will still reflect the proportion of variance explained.
Be aware, however, that model diagnostics and performance measures (such as $R^2$) applied after model selection may (and will) be overly optimistic if the model is evaluated on the same data that was used for model building (e.g. variable selection).