Solved – How to rank coefficients returned from a ridge regression

caretglmnetridge regression

I am running a ridge regression using GLMNET (alpha = 0) and would like to interpret the coefficients returned. I know there isn't really a significance test for this, but can I at least rank the variable importance? I am interested in explanatory, not predictive power, which is why this is important to me.

Here are some of my thoughts on how to do this:

  1. Standardize the data before running it through GLMNET. Pass standardize=F to GLMNET and just sort the coefficients by magnitude. I'm not sure that this is correct, but someone suggested it for LASSO elsewhere.
  2. If I run the regression using caret, then it gives me a varImp function. I don't know how it calculates this, but the results seem nice. How does this work, and if it is correct, can I implement it for standard GLMNEt without caret?
  3. Someone recommended I somehow compute confidence intervals for each coefficent and see how far they are from 0. Any variable that included 0 in this interval is unimportant.

One problem here is that options 1 and 2 are giving me different results, so I am not sure who to trust.

Edit: I see from this answer that in option 2, caret's varImp function is actually just the magnitude of the coefficients (option 1).

Best Answer

1) Ridge regression shrinks perfectly correlated predictors equally. Suppose that your true model is:

$$ Y = X_1 + X_2 + 2X_3 + \epsilon $$

Where $X_1$ and $X_2$ are perfectly correlated ($X_1 = X_2$ in distribution) and $X_3$ is uncorrelated with the other two. Then, depending on your specification of the model, you would get the following regressions:

  • $X_1$ and $X_2$ in: $Y = X_1 + X_2 + 2X_3$
  • Only $X_1$ in : $Y = 2X_1 + 2X_3$
  • Only $X_2$ in : $Y = 2X_2 + 2X_3$

so the variable importance ranking is very dependent on what variables are available and specified as in or not in the model. Worse, a reasonable method would probably say that $X_1$ and $X_2$ are equally important, and also, as a set, equal in importance to $X_3$. It doesn't seem like there is a way to recover this from a single ridge regression.

2) You answered this already.

3) One option would be bootstrapping. You can bootstrap sample your training data, and fit a ridge on each sample. This will let you get a sample distribution of the coefficients, and you could derive intervals from these. This has similar issues to 1) though.

Related Question