Is there a closed form solution for the following least squares problem:
$$ \min_\mathbf{x}\|\mathbf{a+Bx}\|^2 ~~\text{s.t}~~\|\mathbf{x}\|^2 \leq \alpha^2$$
where $\mathbf{a} \in \mathbb{C^{M\times 1}}$ and $\mathbf{B} \in \mathbb{C^{M\times N}}$.
[Math] Least squares with a quadratic inequality constraint
least squaresoptimizationqcqp
Related Solutions
Answers
(Q1)
$\|\mathbf{x}(\mu)\|^2$ is monotonically decreasing without jump discontinuities for $\mu \geq 0$ with $\lim_{\mu \to \infty}\|\mathbf{x}(\mu)\|^2 = |\gamma|^2/\|\mathbf{c}\|^2$. If for a fixed $\mu: 0 \leq \mu < \infty$ the solution is $\mathbf{x}^* = \gamma \mathbf{c} /\|\mathbf{c}\|^2$, then this is the solution for all non-negative values of $\mu$. For $\mu <0$ monotonicity holds not on the whole domain of negative real numbers and jump discontinuities can appear.
(Q2)
Yes, see below.
(Q3)
Yes and the global optimal solution is given by $\mathbf{x}^* = \mathbf{x}(\mu^*)$, with $\mathbf{x}(\mu)$ from (6). Although this solution also holds for the case $\alpha^2 = |\gamma|^2/\|\mathbf{c}\|^2$, where the only allowed solution is the one for $\mu \to \infty$, in practice no limit must be taken due to the fact that the solution is already determined to be $\mathbf{x}^* = \gamma \mathbf{c} /\|\mathbf{c}\|^2$.
Strategy for numerical solution
For $\alpha^2 = |\gamma|^2/\|\mathbf{c}\|^2$, no calculation is necessary since the only allowed solution is $\mathbf{x}^* = \gamma \mathbf{c} /\|\mathbf{c}\|^2$.
For $\alpha^2 > |\gamma|^2/\|\mathbf{c}\|^2$, the numerical procedure to calculate the optimal solution simplifies with the help of the answers of (Q1) and (Q2) to calculate $\mathbf{x}(\mu)$ according to (6), starting with $\mu=0$ and increase $\mu$ until (C1) holds.
No solution exists obviously for $\alpha^2 < |\gamma|^2/\|\mathbf{c}\|^2$.
Proofs/Reasons
(Q1)
We first investigate the asymptotic solution for large $\mu$. As a first step we divide the first row of (3) by $\mu$: $$ \left(A^{\dagger} A/\mu + I\right) \mathbf{x}(\mu) + \mathbf{c} \lambda(\mu)/\mu = \mathbf{y}/\mu.\tag{A1}\label{A1} $$ Multiplying \eqref{A1} from the left with $\mathbf{c}^{\dagger}$ and using both, the constraint (C2) and the fact that we look for solutions with bounded norm only, we obtain the asymptotic expression $$ \lim_{\mu \to \infty}\lambda(\mu)/\mu = -\gamma/\|\mathbf{c}\|^2\tag{A2}.\label{A2} $$ With \eqref{A2} in the asymptotic limit of \eqref{A1}, we find $$ \lim_{\mu \to \infty} \mathbf{x}(\mu) = \gamma \mathbf{c} /\|\mathbf{c}\|^2=:\mathbf{x}_{\infty},\tag{A3}\label{A3} $$ and thus $$ \lim_{\mu \to \infty} \|\mathbf{x}(\mu)\|^2 = |\gamma|^2/\|\mathbf{c}\|^2.\tag{A4}\label{A4} $$ To investigate the monotonicity of $\|\mathbf{x}(\mu)\|^2$ we start with differentiating (3) w.r.t. $\mu$: The first row yields $$ \mathbf{x}(\mu) + (A^{\dagger}A + \mu I) \mathbf{x}'(\mu) + \mathbf{c}\lambda'(\mu) = 0,\tag{A5}\label{A5} $$ and the second row the trivial condition $\mathbf{c}^{\dagger} \mathbf{x}'(\mu) = 0$. We use for the function derivative w.r.t. $\mu$ the abbreviation $u'(\mu) \equiv \rm{d}\, u(\mu)/\rm{d}\mu$. Multiplying \eqref{A5} with $\mathbf{c}^{\dagger}$ from left we obtain $$ \gamma + \mathbf{c}^{\dagger} A^{\dagger}A\mathbf{x}'(\mu) + \|\mathbf{c}\|^2\lambda'(\mu) = 0,\tag{A6}\label{A6} $$ which yields $$ \lambda'(\mu) = -\dfrac{ \gamma + \mathbf{c}^{\dagger}A^{\dagger}A \mathbf{x}'(\mu)}{\|\mathbf{c}\|^2}.\tag{A7}\label{A7} $$ Using \eqref{A7} in \eqref{A5} we get a conditional equation for $\mathbf{x}'(\mu)$: $$ \left(\left(I - \dfrac{\mathbf{c}\mathbf{c}^{\dagger}}{\|\mathbf{c}\|^2}\right)A^{\dagger}A + \mu I\right) \mathbf{x}'(\mu) = \frac{\gamma \mathbf{c}}{\|\mathbf{c}\|^2} - \mathbf{x}(\mu).\tag{A8}\label{A8} $$ On the left hand side the orthogonal projector $P:=I - \mathbf{c}\mathbf{c}^{\dagger}/\|\mathbf{c}\|^2$ appears that maps elements of $\mathbb{C}^n$ to $\mathcal{U} = \{\mathbf{x} \in \mathbb{C}^n \mid \mathbf{c}^{\dagger}\mathbf{x} = 0\}\subset \mathbb{C}^n$, the subspace orthogonal to the one-dimensional subspace along $\mathbf{c}$. Multiplying \eqref{A8} with $P$ from left we get: $$ \left(PA^{\dagger}A + \mu P \right) \mathbf{x}'(\mu) = -P\mathbf{x}(\mu).\tag{A9}\label{A9} $$ Since $\mathbf{c}^{\dagger} \mathbf{x}'(\mu) = 0$, which is equivalent to $P\mathbf{x}'(\mu) = \mathbf{x}'(\mu)$, we can further write $$ \left(PA^{\dagger}AP + \mu P \right) \mathbf{x}'(\mu) = -P\mathbf{x}(\mu),\tag{A10}\label{A10} $$ which is a linear equation on $\mathcal{U}$ only that completely determines $\mathbf{x}'(\mu)$. We expand \eqref{A10} in an orthonormal basis of $\mathcal{U}$ and use the following notation: $\mathbf{x}_{\mathcal{U}}(\mu)$, $\mathbf{x}'_{\mathcal{U}}(\mu)$ for the representation of $P\mathbf{x}(\mu)$, $\mathbf{x}'(\mu)$ in $\mathcal{U}$, $I_{\mathcal{U}}$ for the identity in $\mathcal{U}$ and $S_{\mathcal{U}}$ for the representation of $PA^{\dagger}AP$ in $\mathcal{U}$. We can write $$ \mathbf{x}'_{\mathcal{U}} = -\left(S_{\mathcal{U}} + \mu I_{\mathcal{U}}\right)^{-1}\mathbf{x}_{\mathcal{U}}(\mu),\tag{A11}\label{A11} $$ and it follows \begin{align} \frac{\rm{d}}{\rm{d}\mu}\|\mathbf{x}(\mu)\|^2 & = \mathbf{x}'(\mu)^{\dagger}\mathbf{x}(\mu) + \mathbf{x}(\mu)^{\dagger}\mathbf{x}'(\mu)\tag{A12}\label{A12}\\ &=\mathbf{x}'(\mu)^{\dagger}P P \mathbf{x}(\mu) + \mathbf{x}(\mu)^{\dagger}PP\mathbf{x}'(\mu)\\ &=\mathbf{x}_{\mathcal{U}}'(\mu)^{\dagger}\mathbf{x}_{\mathcal{U}}(\mu) + \mathbf{x}_{\mathcal{U}}(\mu)^{\dagger}\mathbf{x}_{\mathcal{U}}'(\mu)\\ &=-2\mathbf{x}_{\mathcal{U}}(\mu)^{\dagger}\left(S_{\mathcal{U}} + \mu I_{\mathcal{U}}\right)^{-1}\mathbf{x}_{\mathcal{U}}(\mu).\tag{A13}\label{A13} \end{align} It can be readily seen from its definition that $S_{\mathcal{U}}$ is positive definite, hence $\left(S_{\mathcal{U}} + \mu I_{\mathcal{U}}\right)^{-1}$ is positive definite for $\mu \geq 0$. Since $A^{\dagger}A$ is also positive definite we obtain from (6) together with (7) that $\mathbf{x}(\mu)$ is not diverging for $\mu\geq 0$. Therefore, from \eqref{A13} we obtain $\frac{\rm{d}}{\rm{d}\mu}\|\mathbf{x}(\mu)\|^2 \leq 0$ for $\mu \geq 0$, that is $\|\mathbf{x}(\mu)\|^2$ is monotonically decreasing without jump discontinuities for $\mu \geq 0$. However, the monotonicity is not strict because $\rm{d}\|\mathbf{x}(\mu)\|^2/\rm{d}\mu$ vanishes when $\mathbf{x}_{\mathcal{U}}(\mu) = 0$. The only solution compatible with both constraints (C1) and (C2) that yields $\mathbf{x}_{\mathcal{U}}(\mu) = 0$ is $\mathbf{x}(\mu) = \gamma\mathbf{c}/\|\mathbf{c}\|^2 = \mathbf{x}_{\infty}$. Since this is also the asymptotic solution from \eqref{A3}, we have $\lim_{\mu \to \infty} \rm{d}\|\mathbf{x}(\mu)\|^2/\rm{d}\mu = 0$. Moreover, if (6) yields $\mathbf{x}_{\infty}$ for $\mu < \infty$, it must be the solution for all $\mu$.
For $\mu < 0$, $\left(S_{\mathcal{U}} + \mu I_{\mathcal{U}}\right)^{-1}$ is not guaranteed to be positive definite and hence monotonicity is not given on the whole domain of negative real numbers. Moreover, jump discontinuities can appear where $\mu$ matches eigenvalues of $A^{\dagger}A$ or $S_{\mathcal{U}}$.
(Q2)
We have to distinguish two cases:
$\alpha^2 > |\gamma|^2/\|\mathbf{c}\|^2$:
With the monotonicity of $\|\mathbf{x}(\mu)\|^2$ for $\mu \geq 0$ and \eqref{A4}, we find a $\mu^\star: g(\mathbf{x}(\mu^\star)) < 0$. Therefore, Slater's condition holds which guarantees strong duality of the convex optimization problem and thus the existence of a KKT-point $(\mathbf{x}^*,\mu^*, \lambda^*)$, where $\mathbf{x}^*$ is a local optimum for the optimization problem and $(\mu^*,\lambda^*)$ for the corresponding dual problem. Due to the convexity and strong duality of the problem the following KKT-conditions are sufficient conditions for the global optimum: \begin{align} g(\mathbf{x}^*) &\leq 0,\tag{A14}\label{A14}\\ h(\mathbf{x}^*) &= 0,\tag{A15}\label{A15}\\ \mu^* &\geq 0,\tag{A16}\label{A16}\\ \mu^*g(\mathbf{x}^*) &= 0.\tag{A17}\label{A17} \end{align} \eqref{A16} answers the question for this case.$\alpha^2 = |\gamma|^2/\|\mathbf{c}\|^2$:
Slater's condition does not hold, however, the only allowed solution is $\mathbf{x} = \mathbf{x}_{\infty}$, the asymptotic solution. Therefore, we have $\mu^* \to \infty$.
For both cases we get $\mu^* \geq 0$.
(Q3)
For $\mathbf{x}(\mu)$ from (6), \eqref{A15} holds. The remaining task is to find the optimal $\mu^* \geq 0$ such that the remaining KKT-conditions hold. To satisfy \eqref{A17}, we require $\mu^* = 0$ if $g(\mathbf{x}(\mu^*=0)) < 0$ and $\mu^* >0$ otherwise. For the latter case we require $\|\mathbf{x}(\mu^*)\|^2 = \alpha^2$ such that (A14) holds. Due to the fact that $\|\mathbf{x}(\mu)\|^2$ is monotonically decreasing for $\mu \geq 0$, we find $\mu^* = \inf\, \{\mu \geq 0 \mid g(\mathbf{x}(\mu)) \leq 0\}$ and the global optimal solution is given by $\mathbf{x}^* = \mathbf{x}(\mu^*)$.
Your problem can be formulated as convex quadratic programming problem with a linear equality constraint and nonnegativity constraints on $x$.
$\min \frac{1}{2}\left( x^{T}(A^{T}A)x - 2(b^{T}A)x + b^{T}b \right)$
subject to
$\sum_{i=1}^{n} x_{i}=1$
$x \geq 0$.
As a practical matter, you're probably best off using a well-written library routine for linearly constrained quadratic programming. There are many of them available in a variety of programming languages. If you need to implement this yourself, then implementing an active set QP solver is relatively straight forward.
Best Answer
It depends.
Consider the unconstrained least squares problem
$$ \min_\mathbf{x}\|\mathbf{a+Bx}\|^2. $$
This problem has the solution $\mathbf{x}^\star = - (\mathbf{B}^H\mathbf{B})^{-1}\mathbf{B}^H\mathbf{a} $. It is clear that $\mathbf{x}^\star $ is also the solution to your problem if $\|\mathbf{x}^\star\|^2 \leq \alpha^2$. In this case, you have an analytical solution.
However, if $\|\mathbf{x}^\star\|^2 > \alpha^2$, you have to consider the constrained problem
$$ \min_\mathbf{x}\|\mathbf{a+Bx}\|^2 + \lambda \|\mathbf{x}\|^2, $$ where $\lambda >0 $ is the Lagrange-Multiplier. This problem has the solution
$$\mathbf{x}^\star = - (\mathbf{B}^H\mathbf{B} + \lambda \mathbf{I})^{-1}\mathbf{B}^H\mathbf{a}$$
For the proper Lagrange-Multiplier $\lambda$, $\|\mathbf{x}^\star\|^2 = \alpha^2$ holds true. However, there is no analytical way to determine $\lambda$ which can be found by using bi-section search.