Solved – Interpretation of coefficients of glmnet – LASSO/Cox model

I have done a LASSO / Cox model run for a large dataset of 10K observations which has 1200 Variables.

fit    <- glmnet(   x, Surv(time, status), alpha=1, family='cox')
cv.fit <- cv.glmnet(x, Surv(time, status), alpha=1, family='cox')

After CV the model selected 56 variables which have non-zero coefficients, some of the coefficients have negative values and some have positive. I would like to know whether we say something about their significance with respect to the coefficient values of the variables?

What we can say about coefficients with negative value and coefficient with positive value?

Some Variables  and its Coefficients Values
 CSI_SUPPORT               -2.51E-19
 Power.Glass.Moonroof       0.046261522
 FLOOR_PLAN_SUPPORT        -0.005169085
 R.Design.Nubuck.Off.Black  0.254841459
 TOTAL_AMOUNT              -6.19E-05
 K36100                    -0.062819229
 K36100                    -0.237663697
 Textile.Off.Black.seats    0.159802697
 Design.Leather.Black      -0.401298769
 MARKETING_SUPPORT         -0.000182012

Best Answer

The LASSO fit does not carry information on statistical significance.

The coefficients should have a roughly similar interpretation as in a standard Cox model, that is, as log hazard ratios. Positive coefficients indicate that a variable is associated with higher risk of an event, and vice versa for negative coefficients. How important the effects shown are depends on what the variables stand for and on subject knowledge.

Depending on the distribution of these variables you could also consider scaling them to unit variance before fitting the LASSO, which would produce standardised coefficients as a measure of relative variable importance.

Best Answer

Related Solutions

Solved – LASSO vs AIC for feature selection with the Cox model

Solved – how to work with time-dependent data in Lasso Cox regression in glmnet R package

Related Question