Fused Lasso – Troubleshooting Strange Results from Fused Lasso Estimator

constrained optimizationfused-lassolassooptimizationregularization

Let us consider the following estimator:

$$
\hat{\beta}^{F} = \underset{\beta \in \mathbb{R}^{n}}{\arg \min} (y_{i} – \beta_{i})^{2} + \lambda_{1} \sum_{i=1}^{n-1}|\beta_{i} – \beta_{i+1}|,
$$

which is called fused estimator, cf. PROPERTIES AND REFINEMENTS OF THE FUSED LASSO

Next, if I understand correctly, fusion after fusion of the signal $y$ does not change the result, i.e. if
$$
\hat{\alpha}^{F} = \underset{\alpha \in \mathbb{R}^{n}}{\arg \min} (\hat{\beta}^{F}_{i} – \alpha_{i})^{2} + \lambda_{1} \sum_{i=1}^{n-1}|\alpha_{i} – \alpha_{i+1}|,
$$

then $\hat{\alpha}^{F} = \hat{\beta}^{F}$, because projection is idempotent.

Nevertheless, let us try

library("genlasso")
n=20
p0 = c(1/12,1/12,1/12,1/12,1/12,1/12,1/12,1/12,1/12,1/12,1/12,1/12)
lambda_f =  0.2
y = rmultinom(1, size = n, prob = p0)
a = fusedlasso1d(y, X = diag(1, length(p0), length(p0)), gamma = 0, approx = FALSE, maxsteps = 2000,
             minlam = 0, rtol = 1e-07, btol = 1e-07, eps = 1e-4,
             verbose = FALSE)
a_fl = softthresh(a, lambda =  lambda_f, gamma =  0)

b = fusedlasso1d(a_fl[,1], X = diag(1, length(p0), length(p0)), gamma = 0, approx = FALSE, maxsteps = 2000,
             minlam = 0, rtol = 1e-07, btol = 1e-07, eps = 1e-4,
             verbose = FALSE)
b_fl = softthresh(b, lambda =  lambda_f, gamma =  0)

One can check that $a_{fl} \neq b_{fl}$. I am confused here.

Best Answer

We shouldn't expect these estimates to coincide. You would be correct if you used the constrained form of the fused lasso. However, you are using the penalized form. The crux of the issue is that the relationship between the tuning parameters in the constrained and penalized forms is data dependent, and you've changed the data.

Define \begin{equation*} \hat{\beta}^{F}_C = \arg \min_{\beta \in \mathbb{R}^{n}} \|y-\beta\|_2^2 \text{ s.t. } \sum_{i=1}^{n-1}|\beta_{i} - \beta_{i+1}| \leq C \end{equation*} and \begin{equation*} \hat{\alpha}^{F}_C = \arg \min_{\alpha \in \mathbb{R}^{n}} \|\hat\beta^F_C-\alpha\|_2^2 \text{ s.t. } \sum_{i=1}^{n-1}|\alpha_{i} - \alpha_{i+1}| \leq C. \end{equation*} Since projections are idempotent, it must be that $\hat{\beta}^{F}_C=\hat{\alpha}^{F}_C$ for all $C$.

By Lagrangian duality, we may equivalently write \begin{equation*} \hat{\beta}^{F}_C = \arg \min_{\beta \in \mathbb{R}^{n}} \|y-\beta\|_2^2 +\lambda(C, y) \sum_{i=1}^{n-1}|\beta_{i} - \beta_{i+1}| \end{equation*} and \begin{equation*} \hat{\alpha}^{F}_C = \arg \min_{\alpha \in \mathbb{R}^{n}} \|\hat\beta^F_C-\alpha\|_2^2 + \lambda(C,\hat\beta^F) \sum_{i=1}^{n-1}|\alpha_{i} - \alpha_{i+1}| \end{equation*} for some tuning parameter $\lambda$ that depends on $C$ and the data. Since $y \neq \hat\beta^F$, it doesn't follow that $\lambda(C, y) = \lambda(C,\hat\beta^F)$.

Related Question