Solved – Is this the correct way to run an adaptive LASSO

lassorregressionregularizationridge regression

I have been using the code here to run an adaptive LASSO in R using glmnet. Essentially it first runs ridge regression to get coefficients for each predictor. It then tunes lambda in the second step using penalty factor of: $\frac{1}{abs({coefficient\ from\ ridge\ regression})}$.

Best Answer

Yes, but it depends on what you're goal is. It's a little complicated.

What is adaptive Lasso?

Adaptive Lasso was introduced in Zhou (2006). Adaptive Lasso is a modification of Lasso where each coefficient, $\beta_j$, is given its own weight, $w_j$. The coefficients are estimated by minimizing the objective function,

$$ \underset{\beta}{\arg \min }\left\|\mathbf{y}-\sum_{j=1}^{p} \mathbf{x}_{j} \beta_{j}\right\|^{2}+\lambda \sum_{j=1}^{p} w_{j}\left|\beta_{j}\right|. $$

The weights control the rate each coefficient is shrunk towards 0. The general idea is that smaller coefficients should leave the model before larger coefficients.

How do you choose the weights?

The adaptive Lasso is very general. You can set the weights however you'd like, and you'll get something out. You might want to consider what the "best" set of weights are. Zhou (2006) say that you should choose your weights so the adaptive Lasso estimates have the Oracle Property:

  1. You will always identify the set of nonzero coefficients...when the sample size is infinite
  2. The estimates are unbiased, normally distributed, and the correct variance (Zhou (2006) has the technical definition)...when the sample size is infinite.

To ensure the Adaptive Lasso has these properties, you need to choose the weights as $w_j = 1/|\hat{\beta_j}|^{\gamma}$, where $\gamma > 0$ and $\hat{\beta_j}$ is an unbiased estimate of the true parameter, $\beta$. Generally, people choose the Ordinary Least Squares (OLS) estimate of $\beta$ because it will be unbiased. Ridge regression produces coefficient estimates that are biased, so you cannot guarantee the Oracle Property holds

What if I use something else for the weights?

What happens if you ignore the requirement of using unbiased estimates for the weights and use Ridge regression? You can't guarantee you'll get the right subset of coefficients and that they'll have the correct distribution. In practice, this probably doesn't matter. The Oracle Property is an asymptotic guarantee (when $n \to \infty$), so it doesn't necessary apply to your data with a finite number of observations. There may be scenarios where using Ridge estimates for weights performs really well. Zhou (2006) recommends using Ridge regression over OLS when your variables are highly correlated.

Related Question