I have a dataset with around 30 independent variables and would like to construct a generalized linear model (GLM) to explore the relationship between them and the dependent variable.
I am aware that the method I was taught for this situation, stepwise regression, is now considered a statistical sin.
What modern methods of model selection should be used in this situation?
Best Answer
There are several alternatives to Stepwise Regression. The most used I have seen are:
Both PLS Regression and LASSO are implemented in R packages like
PLS: http://cran.r-project.org/web/packages/pls/ and
LARS: http://cran.r-project.org/web/packages/lars/index.html
If you only want to explore the relationship between your dependent variable and the independent variables (e.g. you do not need statistical significance tests), I would also recommend Machine Learning methods like Random Forests or Classification/Regression Trees. Random Forests can also approximate complex non-linear relationships between your dependent and independent variables, which might not have been revealed by linear techniques (like Linear Regression).
A good starting point to Machine Learning might be the Machine Learning task view on CRAN:
Machine Learning Task View: http://cran.r-project.org/web/views/MachineLearning.html