Solved – Equivalence between Elastic Net formulations

elastic netlassooptimizationregressionridge regression

According to Hastie's paper, the elastic net has two equivalent formulations:

$$\hat{\beta} = \underset{\beta}{\operatorname{argmin}} \left\{ \sum_{i=1}^N\left(y_i-\sum_{j=1}^p x_{ij} \beta_j\right)^2 + \lambda_1 \sum_{j=1}^p |\beta_j|+ \lambda_2 \sum_{j=1}^p \beta_j^2 \right\}$$

and

$$\hat{\beta} = \underset{\beta}{\operatorname{argmin}} \left\{ \sum_{i=1}^N\left(y_i – \sum_{j=1}^p x_{ij} \beta_j\right)^2\right\} \;\text{ s.t. } \;(1-\alpha)\sum_{j=1}^p |\beta_j| + \alpha\sum_{j=1}^p \beta_j^2 \leq t$$

where $\alpha = \frac{\lambda_2}{\lambda_1 + \lambda_2}$

My question is how to prove this equivalence formally. Ridge regression and the lasso also have these two possible formulations, but I could not find any reference where this equivalence is proven. A similar question I found in CrossValidated is this one

Lagrangian relaxation in the context of ridge regression

but I'm unable to understand Tristan's explanation. I have some understanding of Lagrange optimization theory, and I guess the answer is around those lines, but since all the papers treat the equivalence as obvious I would like to find a proper reference where this is explicitly demonstrated.

Best Answer

Starting from $$\hat{\beta} = \arg \min_\beta \|X\beta - y\|_2^2 \text{ s.t. } (1-\alpha)\|\beta\|_1 + \alpha\|\beta\|_2^2 \leq t,$$ we can write the dual Lagragian formulation of this optimization problem as $$ \begin{array}{rcl} L(\beta,\alpha,\lambda) & = & \|X\beta - y\|_2^2 + \lambda \left( (1-\alpha)\|\beta\|_1 + \alpha\|\beta\|_2^2 - t\right) \\ & = & \|X\beta - y\|_2^2 + \lambda (1-\alpha)\|\beta\|_1 + \lambda\alpha\|\beta\|_2^2 - \lambda t, \end{array} $$ and we see that this indeed looks like the first problem that you wrote, with parameters $\lambda_1=\lambda (1-\alpha)$ and $\lambda_2=\lambda \alpha$, which leads to the expression of the "elastic" parameter: $$\alpha = \frac{\lambda_2}{\lambda_1+\lambda_2}.$$ That being said, to go from this point to Zou and Hastie's assertion that both problems are equivalent, I admit that I miss a step or two...