Solved – Standardizing dumthe variables for variable importance in glmnet

categorical dataimportanceinterpretationlogisticstandardization

I've used glmnet to build a binomial logistic regression model and I'm now trying to determine the importance of the variables in the model. I've read a few posts about how to get the standardised coefficient values from the model, for example:

Initially I've approached it by standardizing all variables in my model matrix, both continuous and dummy variables, and setting standardize = FALSE.

As noted in this (pdf) free example chapter on dummy variables in regression (in section 7.4), there is no value in standardizing coefficients of dummy variables if the aim is interpretation.

My question is: Is standardizing dummy variables (as well as continuous) and examining the magnitude of the standardized coefficients a valid approach for determining variable importance? Can I say that if a standardized coefficient for a given dummy variable is larger in magnitude than a standardized coefficient for a given continuous variable, the dummy variable is more important?

Best Answer

One option, advocated by Andrew Gelman, is to scale the continuous variables by two standard deviations and leaving the binary variables in their raw format. In this case, the comparison between the two is moving from -1SD to +1SD on the continuous variable is equivalent to going from 0 to 1 (i.e., Category A to Category B) in the binary variable. You can his paper here.