Solved – Alternatives to stepwise regression for generalized linear mixed models

feature selectiongeneralized linear modelmixed modelr

Are there any easy to use alternatives to stepwise variable selection for GLMMs? I have seen implementations of e.g. LASSO for linear regression, but so far not seen anything for mixed models. Mixed models seem non-trivial in general, so I am wondering if any of the fancy new methods have been adapted from them (and possibly implemented in R). Using whatever selection procedure you like and then validating the results seems a sensible way to go in the meantime.

To give some context: in my current project, I am looking at approximately 700 variables and 5000 binary observations. Stepwise selection takes about 1 day; many variables have about 10% missingness.

Edit: Thank you for the very interesting answers so far! Two concerns that I have are: do these new methods have longer runtimes than stepwise selection and can they deal with missing data (if each variable has different missingness, than for hundreds of variables it is very easy to loose all observations in a complete case analysis – something that stepwise selection can deal with by only using small subsets of the available variables at the same time).

Best Answer

How about the ensemble method of boostrapped aggregating, also known as bragging? Using this approach you essentially create a large number of replicates of the original dataset using simple random sampling with replacement (say 10,000 bootstrapped datasets) from your original dataset. Then you implement a variable selection routine (perhaps best subsets or traditional stepwise selection methods) to select the coefficients or predictors that are significant for each of the boostrapped samples. You perform the routines for each bootstrapped samples and then look at the rates of how often the predictors are selected. Predictors that appear in say 90% or more of the sample are then used in the final mixed model. There are many other methods that could be used too, but I highlight this one as it's simple to explain and usually very easy to implement. For more information see, Breiman, Leo (1996). "Bagging predictors". Machine Learning 24 (2): 123–140. doi:10.1007/BF00058655.