Solved – Interpreting glmnet cox coefficients

cox-modelelastic netglmnetsurvival

There have been similar questions regarding interpretation of glmnet results. However this is more specific to the cox part of the package.

I am trying to create a prognostic score for cancer patients using the cox model in the glmnet R package.

After running the cv.glmnet function I get a series of coefficients. This is where I'm stuck.

How do I translate this to the traditional hazard ratios, confidence intervals and pvals table?
Does the package simply help select the variables – of which then I then rerun in a normal full cox regression model?

Best Answer

The coefficients that you get from cv.glmnet are the coefficients that remain after application of the lasso penalty (lasso is the default; other options in glmnet are Ridge regression and elasticnet regression). The magnitude of the penalty is set by the parameter lambda. The optimal value of lambda is determined based on n-fold cross validation. The latter means that retention (or not) of predictors in the penalized model is guided by cross-validated predictive accuracy (e.g., prediction error, area under the ROC curve). This is the reason that no p-values are provided by glmnet. In fact, there is no guarantee that selected predictors in cv.glmnet are significantly associated (in a traditional sense) with the outcome.

Although it is possible to calculate them, glmnet also does not provide standard errors that would be needed for the calculation of 95% confidence intervals. The reason for this is that estimates from penalized regression are biased and accompanying SE's not very meaningful as it is not clear to what extent they reflect bias or variance in the estimates.

The provided coefficients are betas and correspond to log hazard ratios (HR). To obtain the HR, you can take the exponent of the coefficients.

Indeed, the analyses carried out using cv.glmnet can be seen as a tool to select potentially important predictors, which can be used in subsequent analyses.