Solved – Group elastic net

categorical dataelastic netfeature selectionlassomachine learning

The lasso and the elastic net are not able to handle variables with more than two categories and therefore a split of categorical variables into dummies is necessary for the application of these methods. This can result in several problems and therefore there exist extensions for the lasso to the group lasso or the sparse group lasso.

However, I am wondering if such extensions also exist for elastic net. Unfortunately, I wasn't able to find any statistical literature about the topic.

Question: Does a group elastic net exist?

Best Answer

Let $\mathcal{G}$ be the grouping that you're interested in; that is, let $\mathcal{G}$ be a partition of $\{1, \dots, p\}$, where we consider there to be $p$ features. With response $y \in \mathbb{R}^n$ and design matrix $X \in \mathbb{R}^{n \times p}$, the group lasso estimator is $$\arg\min_{\beta \in \mathbb{R}^p} \frac{1}{2n} \|y - X \beta \|_2^2 + \lambda \sum_{g \in \mathcal{G}} |\mathcal{G}|^{1/2} \|\beta_g\|_2.$$ Applying another squared $\ell_2$ penalty to induce overall shrinkage, we'd get the estimator $$\arg\min_{\beta \in \mathbb{R}^p} \frac{1}{2n} \|y - X \beta \|_2^2 + \lambda \sum_{g \in \mathcal{G}} |\mathcal{G}|^{1/2} \|\beta_g\|_2 + \mu \|\beta\|_2^2.$$ We might call this the "group elastic net". By Lagrangian duality, we can write \begin{align*} \arg\min_{\beta \in \mathbb{R}^p} & \frac{1}{2n} \|y - X \beta \|_2^2 + \lambda \sum_{g \in \mathcal{G}} |\mathcal{G}|^{1/2} \|\beta_g\|_2 + \mu \|\beta\|_2^2 \\ = \, \arg\min_{\beta \in \mathbb{R}^p \, : \, \|\beta\|_2^2 \leq C} & \frac{1}{2n} \|y - X \beta \|_2^2 + \lambda \sum_{g \in \mathcal{G}} |\mathcal{G}|^{1/2} \|\beta_g\|_2 \\ = \, \arg\min_{\beta \in \mathbb{R}^p \, : \, \|\beta\|_2 \leq \sqrt{C}} & \frac{1}{2n} \|y - X \beta \|_2^2 + \lambda \sum_{g \in \mathcal{G}} |\mathcal{G}|^{1/2} \|\beta_g\|_2 \\ = \, \arg\min_{\beta \in \mathbb{R}^p} & \frac{1}{2n} \|y - X \beta \|_2^2 + \lambda \sum_{g \in \mathcal{G}} |\mathcal{G}|^{1/2} \|\beta_g\|_2 + \tilde\mu \|\beta\|_2 \\ = \, \arg\min_{\beta \in \mathbb{R}^p} & \frac{1}{2n} \|y - X \beta \|_2^2 + \left( \lambda \sum_{g \in \mathcal{G}} |\mathcal{G}|^{1/2} \|\beta_g\|_2 + \tilde\mu' p^{1/2} \|\beta\|_2 \right), \end{align*} where $\tilde\mu$ is the corresponding dual variable and $\tilde\mu' = p^{-1/2} \tilde\mu$. As we can see, this last expression is a group lasso with "overlapping" groups, since $\mathcal{G} \cup \{1, \dots, p\}$ is no longer a partition. Further, the group $\{1, \dots, p\}$ has a dual variable (or tuning variable) $\tilde\mu$ which is distinct from the dual variable $\lambda$ for the other groups.

This can be optimization problem can be solved using the package gglasso. Reading the section on page 9 of the documentation here will tell you about the gglasso function, which should be used. Note that the argument pmax will have to manually supplied with a last component which will serve as a tuning parameter.