Let us consider the following estimator:
$$
\hat{\beta}^{F} = \underset{\beta \in \mathbb{R}^{n}}{\arg \min} (y_{i} – \beta_{i})^{2} + \lambda_{1} \sum_{i=1}^{n-1}|\beta_{i} – \beta_{i+1}|,
$$
which is called fused estimator, cf. PROPERTIES AND REFINEMENTS OF THE FUSED LASSO
Next, if I understand correctly, fusion after fusion of the signal $y$ does not change the result, i.e. if
$$
\hat{\alpha}^{F} = \underset{\alpha \in \mathbb{R}^{n}}{\arg \min} (\hat{\beta}^{F}_{i} – \alpha_{i})^{2} + \lambda_{1} \sum_{i=1}^{n-1}|\alpha_{i} – \alpha_{i+1}|,
$$
then $\hat{\alpha}^{F} = \hat{\beta}^{F}$, because projection is idempotent.
Nevertheless, let us try
library("genlasso")
n=20
p0 = c(1/12,1/12,1/12,1/12,1/12,1/12,1/12,1/12,1/12,1/12,1/12,1/12)
lambda_f = 0.2
y = rmultinom(1, size = n, prob = p0)
a = fusedlasso1d(y, X = diag(1, length(p0), length(p0)), gamma = 0, approx = FALSE, maxsteps = 2000,
minlam = 0, rtol = 1e-07, btol = 1e-07, eps = 1e-4,
verbose = FALSE)
a_fl = softthresh(a, lambda = lambda_f, gamma = 0)
b = fusedlasso1d(a_fl[,1], X = diag(1, length(p0), length(p0)), gamma = 0, approx = FALSE, maxsteps = 2000,
minlam = 0, rtol = 1e-07, btol = 1e-07, eps = 1e-4,
verbose = FALSE)
b_fl = softthresh(b, lambda = lambda_f, gamma = 0)
One can check that $a_{fl} \neq b_{fl}$. I am confused here.
Best Answer
We shouldn't expect these estimates to coincide. You would be correct if you used the constrained form of the fused lasso. However, you are using the penalized form. The crux of the issue is that the relationship between the tuning parameters in the constrained and penalized forms is data dependent, and you've changed the data.
Define \begin{equation*} \hat{\beta}^{F}_C = \arg \min_{\beta \in \mathbb{R}^{n}} \|y-\beta\|_2^2 \text{ s.t. } \sum_{i=1}^{n-1}|\beta_{i} - \beta_{i+1}| \leq C \end{equation*} and \begin{equation*} \hat{\alpha}^{F}_C = \arg \min_{\alpha \in \mathbb{R}^{n}} \|\hat\beta^F_C-\alpha\|_2^2 \text{ s.t. } \sum_{i=1}^{n-1}|\alpha_{i} - \alpha_{i+1}| \leq C. \end{equation*} Since projections are idempotent, it must be that $\hat{\beta}^{F}_C=\hat{\alpha}^{F}_C$ for all $C$.
By Lagrangian duality, we may equivalently write \begin{equation*} \hat{\beta}^{F}_C = \arg \min_{\beta \in \mathbb{R}^{n}} \|y-\beta\|_2^2 +\lambda(C, y) \sum_{i=1}^{n-1}|\beta_{i} - \beta_{i+1}| \end{equation*} and \begin{equation*} \hat{\alpha}^{F}_C = \arg \min_{\alpha \in \mathbb{R}^{n}} \|\hat\beta^F_C-\alpha\|_2^2 + \lambda(C,\hat\beta^F) \sum_{i=1}^{n-1}|\alpha_{i} - \alpha_{i+1}| \end{equation*} for some tuning parameter $\lambda$ that depends on $C$ and the data. Since $y \neq \hat\beta^F$, it doesn't follow that $\lambda(C, y) = \lambda(C,\hat\beta^F)$.