Solved – How to interpret coefficients of a multinomial elastic net (glmnet) regression

caretelastic netglmnetmultinomial logitmultinomial-distribution

I'm trying to model a membership in one of three well-being clusters (flourisher, normative, languisher) based on a set of predictors, using elastic net for both variable selection & modelling. I first use the caret package in combination with the glmnet package to do a 10-fold crossvalidation of a multinomial logistic regression to find the optimal values for $\alpha$ and $\lambda$:

set.seed(123456)
elastic_train <- train(cluster_membership ~ ., data = 
                   data_multi, method = "glmnet",
                   tuneLength = 25, trControl = 
                   trainControl(method = "repeatedcv", 
                   search = "random"))

After that, I refit a glmnet model with the optimal $\alpha$ and $\lambda$ and pull out the coefficients of the model:

elastic_mod <- glmnet(model.matrix(cluster_membership ~ ., 
                     data = data_multi), 
data_multi$cluster_membership,family = "multinomial", alpha = 
elastic_net$bestTune$alpha, 
lambda = elastic_net$bestTune$lambda)

coef(elastic_mod, s = 'lambda.min')

My question is, why are there three sets of predictors & how do I interpret them? I understand that in ordinary multinomial regression, one of the outcome categories is used as a reference level & therefore there are $n_{category} – 1$ sets of predictors (e.g. in my case, there would be one set of predictors predicting flourisher vs normative, and another set of predictors predicting languisher vs normative). However, this is not the case in the glmnet multinomial output – there are clearly three sets of predictors:

$flourisher
24 x 1 sparse Matrix of class "dgCMatrix"
                                        1
(Intercept)                  -1.097799622
(Intercept)                   .          
genderFemale                  .          
ethnicAsian                  -0.209824802
ethnicMaori/Pacific Islander  .          
ethnicOther                  -0.976265266
age                           .          
BMI                           .          
BMR_kcal                      .          
CRPmgL                        .          
md_ndrink                     .          
md_refresh                    0.262365972
md_sleep                      .          
md_dstress                   -0.135152345
md_dwrkld                     .          
md_dcreate                   -0.028441041
md_dpac                       0.007127653
md_dfruit                     .          
md_dveg                       .          
md_dchips                    -0.091275151
md_dsweets                    .          
md_dsoftdrk                  -0.069636339
md_dmood_happy                1.514874739
md_dmood_sad                 -1.167555384

$normative
24 x 1 sparse Matrix of class "dgCMatrix"
                                      1
(Intercept)                  1.10289610
(Intercept)                  .         
genderFemale                 0.11272708
ethnicAsian                  0.06757730
ethnicMaori/Pacific Islander .         
ethnicOther                  .         
age                          .         
BMI                          .         
BMR_kcal                     .         
CRPmgL                       .         
md_ndrink                    .         
md_refresh                   .         
md_sleep                     .         
md_dstress                   0.01914515
md_dwrkld                    .         
md_dcreate                   .         
md_dpac                      .         
md_dfruit                    .         
md_dveg                      .         
md_dchips                    .         
md_dsweets                   .         
md_dsoftdrk                  .         
md_dmood_happy               .         
md_dmood_sad                 .         

$languisher
24 x 1 sparse Matrix of class "dgCMatrix"
                                        1
(Intercept)                  -0.005096477
(Intercept)                   .          
genderFemale                  .          
ethnicAsian                   .          
ethnicMaori/Pacific Islander  .          
ethnicOther                   0.037015029
age                           .          
BMI                           0.105868542
BMR_kcal                      .          
CRPmgL                        .          
md_ndrink                    -0.027819620
md_refresh                   -0.020768855
md_sleep                      .          
md_dstress                    .          
md_dwrkld                     .          
md_dcreate                    0.121300148
md_dpac                      -0.016322027
md_dfruit                     .          
md_dveg                       .          
md_dchips                     .          
md_dsweets                    .          
md_dsoftdrk                   .          
md_dmood_happy               -0.032887466
md_dmood_sad                  1.028937162

How do I interpret the predictors for each category? Do I understand it correctly that there's no concrete reference category?

Best Answer

I emailed kind Dr. Hastie who is the maintainer of the glmnet package and got the following answer:

In the traditional case, the base category is arbitrary. In fact you can take a fitted model where say category one is the base category, and simply by subtraction of coefficients, make an equivalent model where another is the base (and the fit is identical). (Care must be taken with the standard errors).

Concretely, if category 1 is the base, and you have coefficient vector \beta_k for category k , k=2,…,K (with \beta_1=0) you can make say category K the base. In this case the new coefficients would be \beta’_k = \beta_k-\beta_K and the fitted probabilities would be unchanged.

With glmnet we chose a symmetric option instead, because we use regularization. With regularization, it would matter and make a difference if you used an asymmetric representation because of the way the shrinking works.

I like the type.multinomial= “grouped” option. In this case a group lasso penalty is applied to the set of coefficients for each feature, and the estimated coefficients average 0.

Again, you can post hoc move to an asymmetric representation as above without changing the fitted model.