Your question has an implicit assumption that $R^2$ is a good measure of the quality of the fit and is appropriate for comparing between models. I think that your background information provides evidence that $R^2$ is not a good tool for what you are trying to do. After all, you can increase $R^2$ by adding nonsense variables to your model.
Did you take the variables that were found using the elastic net and refit a new regression model using those variables rather than use the estimates from the elasticnet fit? That is kind of like entering your data into a nice statistical software program and using it to round the data and print it out so you can calculate the mean using an abacus.
If you want the fewest predictors possible (and still get a reasonable fit) then lasso methods will tend to result in fewer predictors than elasticnet methods. The advantage of the elasticnet method is not in finding the fewest variables, but in finding a good model that takes advantage of the information in the variables and avoids the bias that you get with stepwise models.
A better comparison would be how well they predict a new set of observations, or maybe a press statistic or cross-validation.
You can also perform a likelihood-ratio test (LRT) with mixed models. The command is also anova(model1, model2)
. But there are a few things you have to consider. To use the anova
command after lme
that has been fitted by REML (Restricted maximum likelihood; the default), the models have to include the same fixed effects and both have to be fitted via REML. If you want to calculate a likelihood-ratio test for models with different fixed effects, you have to fit the models via maximum likelihood (ML). You can do that by specifying the option method="ML"
within the lme
command. After that, you can just type anova(lme.mod1, lme.mod2)
to calculate the LR test. If the LR test is significant, you have evidence that the model including probCategorySame
is an improvement over the model without that variable.
As an alternative, you can also simulate the likelihood-ratio test with the following syntax (lm1 is the simpler model and lm2 is the alternative model):
sim.lme <- simulate(lm1, nsim=1000, m2=lm2, method="ML")
plot(sim.lme)
This produces a graphic like that
The key is that the simulated empirical $p$-values should be roughly the same as the nominal $p$-values (the blue line should be diagonal). If that is the case, you can use the nomial $p$-value from the likelihood-ratio test.
In their book Mixed-Effects Models in S and S-Plus (page 87-92), Pinheiro and Bates discourage the use of likelihood-ratio test for assessing the significance of fixed effects. They recommend the conditional $F$-test instead, which you can use with the anova
command with a single model. In your case, that would be anova(lm2)
.
Best Answer
Either of a pair of highly correlated predictors may be regarded as "redundant". So in addition to the other problems with stepwise variable selection methods, it tends to be a toss-up which one gets into the final model—use bootstrap validation & see how (in)consistently the same predictors are selected in different bootstrap samples. Avoid the temptation, therefore, to think that the variables eliminated by stepwise selection are irrelevant in any broader sense than that you got away without using them to fit a model to a particular sample. Better ways of dealing with multicollinearity are data reduction prior to modelling or ridge regression: see here.