Solved – Building final model in glmnet after cross validation

This is my first time working with regularized regression so I apologize if the answer to this is obvious. I am planning on using GLMnet to run a regularized logistic regression on my data set using glmnet. Previously I have been using unregularized logistic regression and have evaluated my model/ compared different models using repeated random sub-sampling validation. After cross validation I then apply the better performing model on the entire test set in order to come up with the final model (aka the model that will be applied to new incoming data). This procedure follows the advice of this (highly ranked) cross validated discussion.

However, I am confused over the question of how to build my final model for application after determining the better model with cv.glmnet. I would have imagined that I would do something similar as before where I get the best model, which in this case refers to the best lambda value (and alpha with elastic net) and run glmnet on the entire data set while passing the best value of lambda form cv.glmnet. However, according to this cross validated discussion:

"you're not actually supposed to give glmnet a single value of lambda. "

So, my concrete question is how to I implement the results of cv.glmnet in order to build my final model?

EDIT:
Based on the comments following the response by "Jogi", it appears that the coefficients of the best cv.glmnet model are the same as the coefficients when
the best lambda which results from cv.glmnet is supplied to the entire dataset. Is this basically the answer to my question? If so can anyone elaborate on why this is the case?

Here is a sample:

age     <- c(4, 8, 7, 12, 6, 9, 10, 14, 7) 
gender  <- as.factor(c(1, 0, 1, 1, 1, 0, 1, 0, 0))
bmi_p   <- c(0.86, 0.45, 0.99, 0.84, 0.85, 0.67, 0.91, 0.29, 0.88) 
m_edu   <- as.factor(c(0, 1, 1, 2, 2, 3, 2, 0, 1))
p_edu   <- as.factor(c(0, 2, 2, 2, 2, 3, 2, 0, 0))
f_color <- as.factor(c("blue", "blue", "yellow", "red", "red", "yellow", 
                   "yellow", "red", "yellow"))
asthma <- c(1, 1, 0, 1, 0, 0, 0, 1, 1)
xfactors <- model.matrix(asthma ~ gender + m_edu + p_edu + f_color)[, -1]
x        <- as.matrix(data.frame(age, bmi_p, xfactors))

#Lastly, cross validation can also be used to select lambda.
cv.glmmod <- cv.glmnet(x, y=asthma, alpha=1,family="binomial")
#plot(cv.glmmod)
(best.lambda <- cv.glmmod$lambda.min)
coef(cv.glmmod, s = "lambda.min")

which outputs:

coef(cv.glmmod, s = "lambda.min")
11 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 0.2231436
age .
bmi_p .
gender1 .
m_edu1 .
m_edu2 .
m_edu3 .
p_edu2 .
p_edu3 .
f_colorred .
f_coloryellow .

And the full dataset coefficients are:

fit = glmnet(x, y=as.factor(asthma),lambda = best.lambda, family="binomial", alpha = 1)
coef(fit)

coef(fit)
11 x 1 sparse Matrix of class "dgCMatrix"

(Intercept) 0.2231436

age 0.0000000
bmi_p .
gender1 .
m_edu1 .
m_edu2 .
m_edu3 .
p_edu2 .
p_edu3 .
f_colorred .
f_coloryellow .

library(glmnet) data(QuickStartExample) # your approach: use different lambdas and perform cross validation maually fit_1 = glmnet(x, y,lambda = 1) # glmnet's approach: automated cross validation cvfit = cv.glmnet(x, y) plot(cvfit) # coeficients of the final model coef_cv=coef(cvfit, s = "lambda.min") # prediction of the final model predict(cvfit, newx = x[1:5,], s = "lambda.min") # extract optimal lambda lmabda_opt=cvfit$lambda.min # manually plugging lambda into glmnet fit_2 = glmnet(x, y,lambda = lmabda_opt) # compare cefficients - equal cbind(coef_cv,coef(fit_2)) # compare predictions - equal cbind(predict(cvfit, newx = x[1:5,], s = "lambda.min"),predict(fit_2, newx = x[1:5,]))

Best Answer

Instead of performing a cross validation for each set of variables separately using a penalized regression, the cv.gmlnet function does this automatically:

So for each lambda, a cross validation is performed and a performance meansure is calculated. Via plot(cvfit) you can see the result of the cross validation. Recall, that generally using glmnet() and plugging in arbitrary lambdas is not recommended. More detals can be found in the excellent tutorial: https://web.stanford.edu/~hastie/Papers/Glmnet_Vignette.pdf

Best Answer

Related Solutions

Solved – How to interpret all zero coefficients in the results of cv.glmnet

Solved – How to interpret this glmnet() code and its output in R

Related Question