Partial alternating minimization for solving generalized consensus optimization

nonlinear optimization

I'm trying to solve to the following generalized consensus optimization problem, which is a particular kind of nonlinear nonconvex problem:

$$\begin{equation} \tag{P}
X^*=\underset{x_i \in \mathcal{X}_i,~ \forall i \in \{0,1,\dots,N\} \\ y \in \mathbb{R}^m}{\text{argmin}}\mkern+05mu \sum_{i=0}^N f_i(x_i)
\end{equation}$$

$$\begin{alignat}{2}
\text{st. } & \mathcal{X}_i = \left\{x_i \in \mathbb{R}^n \mid \psi(x_i) = 0, ~ \phi(x_i) \leq 0\right\}, \forall i \in \{0,1,\dots,N\}\\
& X – H(y) = 0
\end{alignat}$$

where $f:\mathbb{R}^n\mapsto\mathbb{R}$, $h:(\mathbb{R}^m,\mathbb{R}^p)\mapsto\mathbb{R}$ are twise continuously differentiable functions, $\mathcal{X}$ is a compact set (possibly non convex). $X=(x_0,x_1,\dots,x_N)$ the optimization vector that can be written as the concatenation of the subvectors $x_i$, and similarly $H(y)=(h(y,\tau_0),h(y,\tau_1),\dots,h(y,\tau_N))$.

So far, I'm able to solve the following problems:
$$\begin{equation} \tag{$P_i$}
x_i^*=\underset{x_i \in \mathcal{X}_i}{\text{argmin}}\mkern+05mu g(x_i)
\end{equation}$$

$$\text{st. } \mathcal{X}_i = \left\{x_i \in \mathbb{R}^n \mid \psi(x_i) = 0, ~ \phi(x_i) \leq 0\right\} $$
where $g:\mathbb{R}^n\mapsto\mathbb{R}$ can be any twise continuously differentiable function.

I spent a lot of time reading literature about the topic (the books of
Bertsekas, among others) and I think it may be possible to solve it iteratively using the following algorithm:
$$
\begin{alignat}{2}
x_i^{k+1} &= \underset{x_i \in \mathcal{X}_i}{\text{argmin}}\mkern+05mu \mathcal{L}_\rho(x_0^k,x_1^k,\dots,x_i,\dots,x_N^k,y^k,\Lambda^k), \forall i \in \{0,1,\dots,N\} \\
&= \underset{x_i \in \mathcal{X}_i}{\text{argmin}}\mkern+05mu f_i(x_i) + {\lambda_i^k}^T (x_i-h(y^k,\tau_i)) + \frac{\rho}{2} \left\|x_i-h(y^k,\tau_i)\right\|^2, \forall i \in \{0,1,\dots,N\} \tag{$A_1^i$} \\
y^{k+1} &= \underset{y \in \mathbb{R}^m}{\text{argmin}}\mkern+05mu \mathcal{L}_\rho(X^{k+1},y,\Lambda^k) \\
&= \underset{y \in \mathbb{R}^m}{\text{argmin}}\mkern+05mu {\Lambda^k}^T (X^{k+1}-H(y)) + \frac{\rho}{2} \left\|X^{k+1}-H(y)\right\|^2 \tag{$A_2$} \\
\Lambda^{k+1} &= \Lambda^k + \rho (X^{k+1}-H(y^{k+1}))
\end{alignat}
$$

where $\Lambda=(\lambda_0,\lambda_1,\dots,\lambda_N)$, and $\mathcal{L}_\rho(X,y,\Lambda) = \sum_{i=0}^N f_i(x_i) + \Lambda^T (X-H(y)) + \frac{\rho}{2} \left\|X-H(y)\right\|^2$.

Now I'm able to perform this iterative algorithm in practice.
It corresponds to the ADMM applied to a nonlinear nonconvex problem with nonlinear nonconvex coupling constraints. Unfortunately, I didn't find found any convergence theorem about ADMM in this context. I would like to prove the convergence of this algorithm for $\rho$ "large enough", nor at least that any converging sequence $\left\{X^k,y^k,\Lambda^k\right\}_{k=0}^\infty$ converges to a local minima of the original problem $P$.

I tried to combine various methods and theorems, among which the exact penalty methods based on the augmented lagrangian, the partial augmented Lagrangian method, the Gauss-Seidel block method, the cylic coordinate descent method, proximal alternating minization methods, and even the very general Zangwill global convergence theorem. It seems that there is always some assumptions that do not hold in my case. A related work very similar to what I'm trying to do is On the Convergence of Alternating Direction Lagrangian Methods…. Although the demonstrations are carried out using linear coupling constraints, I think it is pretty straightforward to adapt them to prove that if it converges, it converges to a local minimum of the original problem $P$, but it cannot be used to prove the convergence.

I noticed that the objective function of the subproblems $A_1^i$ is strongly convex, and that $A_2^i$ can be turned into a strictly convex optimization problem by using the Moreau-Yosida regulzation wrt. $y^k$ or by simply adding the quadric penalty $\gamma\left\|y\right\|^2$ providing that $\gamma$ is large enough. Therefore each of the subproblem $A_1^i$ and $A_2$ have a unique minimizer, and as far as I know it is enough to prove that the algorithm always converges and converges to a local minimum of the original problem. I don't know if I'm right and I would like to know if it is possible to prove anything without modification of the algorithm.

Best Answer

I think I managed to prove the converge. I original problem can be rewritten as follow: $$ \begin{equation} \tag{P} X^*=\underset{x_i \in \mathcal{X}_i,~ \forall i \in \{0,1,\dots,N\} \\ z \in \mathcal{Z}}{\text{argmin}}\mkern+05mu \sum_{i=0}^N f_i(x_i) \end{equation} $$ $$ \begin{alignat}{2} \text{st. } & \mathcal{X}_i = \left\{x_i \in \mathcal{D}_x \mid \psi(x_i) = 0, ~ \phi(x_i) \leq 0\right\}, \forall i \in \{0,1,\dots,N\}\\ & \mathcal{Z} = \left\{z \in \mathcal{D}_z \mid \exists y \in \mathbb{R}^m s.t. Z-H(y)=0 \right\}=\left\{z \in \mathbb{R}^n \mid \text{dist}(Z,H)=0 \right\}\\ & X - Z = 0 \end{alignat} $$ where $\mathcal{D}_x \subset \mathbb{R}_n, \mathcal{D}_z \subset \mathbb{R}_m$ are compact sets, and $h$, $\psi$, $\phi$ are twice continuously differentiable.

Based on that formulation, the coupling constraints are now linear and the main theorem of On the Convergence of Alternating Direction Lagrangian Methods.... can be applied directly. My intuition was the right one although I forgot to mention that problem $A_2$ is strongly convex wrt. $Z=H(y)$ and that $y$ never appears directly but only $H(y)$. After convergence, $y^*$ can be computed by solving: $$ y^*= \underset{y \in \mathbb{R}^m}{\text{argmin}}\mkern+05mu \left\|X^* - H(y) \right\|^2 $$ This problem is nonconvex but it exists at least a solution such that $\left\|X^* - H(y) \right\|^2 = 0$.

Related Question