Solved – Interpret and compare lasso models

glmnetlassologisticregression

I'm using lasso logistic regression in order to identify important variables and make inferences. For that I deploy glmnet and repeated cross validation to identify the best tuning parameters lambda.

In the first step I build a model including only my control variables.
In the second step I add other predictors and compare the selected variables, fit, and performance with the first model. Below are the measures of the two models.

controls (CV_error| % deviance): 1.320194 | 15.26

full model (CV_error| % deviance): 1.3705 | 14.97

I identify a best lambda for each of the two models (0.03|0.09). The first column measures the mean cross validation error in binomial deviance. The second column measures % of deviance explained compared to the intercept-only model when I fit the model on the whole data.

As far as I know I can use the first column to tell something about the predictive accuracy of the models. The second column can be used to tell something about the performance improvement of the model compared to the null model (comparable to R^2 in linear regression?).

As one can see, my full model doesn't perform better than the controls only model, despite other variables are being selected. Even when I create another model where I don't penalize the controls to keep them all in the model, one additional variable is selected by lasso.

What would you conlude?

Why is there no improvement in performance?

Are the predictors useful despite they don't improve the model?

Best Answer

Looking only at the CV error (I don't think a training-error figure like your "% deviance" is very helpful, especially for a penalized model like the lasso), I would conclude that, at least around your sample size, the predictors beyond the control variables aren't helpful for prediction. Adding them to the model only worsened predictive accuracy. So, they aren't useful.

Why they're not helpful for prediction is of course hard to guess at without knowing the context of this problem.

By the way, the right way to compare your models' predictive accuracy to that of a trivial model is not to look at training error, but to compute CV error for the trivial model.

Related Solutions

Solved – Exact definition of Deviance measure in glmnet package, with crossvalidation

In Friedman, Hastie, and Tibshirani (2010), the deviance of a binomial model, for the purpose of cross-validation, is calculated as

minus twice the log-likelihood on the left-out data (p. 17)

Given that this is the paper cited in the documentation for glmnet (on p. 2 and 5), that is probably the formula used in the package.

And indeed, in the source code for function cvlognet, the deviance residuals for the response are calculated as

-2*((y==2)*log(predmat)+(y==1)*log(1-predmat))

where predmat is simply

predict(glmnet.object,x,lambda=lambda)

and passed in from the encolsing cv.glmnet function. I used the source code available on the JStatSoft page for the paper, and I don't know how up-to-date that code is. The code for this package is surprisingly simple and readable; you can always check for yourself by typing glmnet:::cv.glmnet.

Solved – Performing Cross Validation to Compare Lasso and Other Regression Models in R

I'm not entirely sure I understand precisely where in the analysis pipeline your question is, but I think I can address it by walking through the steps you'll want to take. The software portion of your question is off-topic on CV, but the questions about CV are on-topic, so I'll answer those.

My question is: is it technically proper CV to determine the overall CV error by averaging the error on each fold given that the lambda chosen for each fold will be producing a different lasso result?

The elementary model development process is usually presented with respect to three partitions of your whole data set: train, test and validate. Training and test data are used together to tune model hyperparameters. Validation data is used to assess the performance of alternative models against data that wasn't used in model construction. The notion is that this is representative of new data that the model might encounter.

A slightly more sophisticated elaboration on this process is nested cross-validation. This is preferred because, across the whole process, all data is eventually used in testing and training the model. Instead of using one partitioning of the data, you can do CV partitioning on the whole data set (the outer partition) and then again on the data left over when you hold out one of the outer partitions (the inner set). Here, you tune model hyperparameters on the inner set and have out-of-sample performance evaluated on the outer holdout set. The final model is prepared by composing a final partition over the entire data set, using CV to select a final tuple of hyperparameters and then, at last, estimating a single model on all available data given that selected tuple. In this way, the model building process kind of telescopes on itself, collapsing CV steps as we estimate the final model.

It doesn't matter that alternative inner sets might give you different $\lambda_\text{min}$. What you're characterizing with your out-of-sample performance metrics is the model selection process itself. At the end of the day, you'll still only estimate one model, and that's the value of $\lambda_\text{min}$ that you care about. In the preceeding steps, you don't need to know the particular value of $\lambda_\text{min}$ except as a means to achieve out-of-sample estimates.

While I know that there is some discussion about using stepwise regression, I have used the stepAIC function to prune my variable set.

This is a bit of an understatement: it's not a discussion, it's a consensus that stepwise results are dubious. If you're fitting a lasso anyway, you can get statistically valid model by omitting the stepwise regression step from your analysis. Moreover, since the lasso step won't "see" the stepwise step, your results will have too-narrow error bands and cross-validation results will be irreparably biased. And lasso makes the entire stepwise step pointless anyway, because they solve the same problem! Lasso solves all of the variable selection problems that stepwise attempts to while avoiding the wealth of widely-accepted criticisms of stepwise strategies. There's no downside to using lasso on its own in this case. I'm convinced the only reason stepwise methods are included in R is for pedagogical reasons, and so that the functionality is available should someone need to demonstrate why it's hazardous.

Best Answer

Related Solutions

Solved – Exact definition of Deviance measure in glmnet package, with crossvalidation

Solved – Performing Cross Validation to Compare Lasso and Other Regression Models in R

Related Question