Minimizing quadratic objective over convex set

convex optimizationkarush kuhn tuckernonlinear optimizationnonlinear systemoptimization

Given the vectors $x_0, \theta \in \mathbb{R}^n$, $\varepsilon\in \mathbb{R}^+$, and symmetric positive definite matrices $A, \Sigma\in \mathbb{R}^{n\times n}$, how the following optimization problem can be solved (possibly in closed form)?

$$\begin{array}{ll} \underset{x \in \mathbb{R}^n}{\text{minimize}} & (x – x_0)^\top A (x – x_0)\\ \text{subject to} & \sqrt{x^\top \Sigma x} + x^\top \theta \leq \varepsilon\end{array}$$

What I tried:
First, note that the problem is convex since both the objective is convex in $x$, and the constraint defines a convex set in $x$.

Write the Lagrangian of the problem as

$$L(x,\lambda) = (x-x_0)^\top A (x-x_0) + \lambda (\sqrt{x^\top \Sigma x} + x^\top \theta – \varepsilon). $$ The KKT conditions for this problem leads to the system:
$$
\begin{cases}
A (x-x_0) + \lambda \left(\frac{\Sigma x}{\sqrt{x^\top\Sigma x}}+ \theta \right) = 0
\\ \lambda\left(\sqrt{x^\top \Sigma x} + x^\top \theta – \varepsilon \right) = 0 \\ \sqrt{x^\top \Sigma x} + x^\top \theta – \varepsilon \leq 0\\\lambda\geq 0
\end{cases}
$$

However, the solution of this system gets complicated in my calculation and I am not able to find a closed-form solution. Any suggestion?

Best Answer

If you change coordinates to the eigenbasis of $A$ with lengths of eigenvectors equal to square roots of the corresponding eigenvalues, the objective becomes distance from a point. The constraint set is cut out by a quadratic equation (and remains such after this linear coordinate change), which by further orthonormal coordinate change (diagonalizing $\Sigma-\theta^T\theta$) and a shift you can \textbf{usually} make have equation $d_1 x_1^2+\ldots d_n x_n^2\leq1$. So your problem is equivalent, up to affine coordinate changes, to finding the closest point of the "standard quadratic" with the equation above, to some fixed point in space. If "the point in space" is inside the constraint set (i.e. in the original coordinates $x_0$ satisfies the constraint) then the distance is zero; otherwise the optimum is on the boundary, but while you can reduce the resulting constrained optimization to a single-variable problem of finding the Lagrange multiplier, see this question, for example, it seems unlikely that you will get a closed-form solution (not a guarantee, perhaps the optimal value can be found, even if finding the optimal point is harder).

Here are some details:

The initial coordinate change is from $A=U^T \Delta U=(U \Delta^{1/2})^T (U\Delta^{1/2})=M^TM$ so $(x-x_0)^TA(x-x_0)=(U \Delta^{1/2}(x-x_0))^T (U\Delta^{1/2} (x-x_0)$ and we set $y=U \Delta^{1/2}(x-x_0)$ so that the objective is $y^Ty$. The original $x$ is recoverable as $x=(\Delta^{-1/2}U^Ty)+x_0=M^{-1}y+x_0$.

The constraint set is

$\sqrt{x^T \Sigma x} + x^T \theta \leq \varepsilon$

$\sqrt{x^T \Sigma x} \leq (\varepsilon - x^T \theta)$

Since $\Sigma$ is positive definite, the expression under the root is non-negative and this is equivalent to

$x^T \Sigma x \leq (\varepsilon - x^T \theta)^2=\varepsilon^2-2 \varepsilon x^T\theta+ x^T \theta \theta^T x$

$x^T(\Sigma - \theta \theta^T )x +2 \varepsilon x^T\theta -\varepsilon^2\leq 0$

$(M^{-1}y+x_0)^T(\Sigma- \theta \theta^T)(M^{-1}y+x_0)^T +2 \varepsilon (M^{-1}y+x_0)^T\theta -\varepsilon^2\leq 0$

Collecting terms

$$y^T(M^{-1})^T(\Sigma- \theta \theta^T) (M^{-1})y$$

$$+ y^T (M^{-1})^T (\Sigma- \theta \theta^T)x_0+ x_0^T (\Sigma- \theta \theta^T) (M^{-1}) y+ 2 \varepsilon (M^{-1}y)^T\theta$$

$$x_0^T (\Sigma- \theta \theta^T) x_0+ 2 \varepsilon x_0^T\theta -\varepsilon^2<0$$

Or $$y^T Q y + y^T\beta +c\leq 0,$$ where $Q=(M^{-1})^T(\Sigma- \theta \theta^T) (M^{-1})$. Now, $Q$ is symmetric, so $Q=V^TDV$ with orthogonal $V$, and we set $z=Vy$. The objective is still $y^Ty=z^Tz$. The constraint is now in the form

$z^TDz+z^T\gamma+k \leq 0$.

Now we pass to coordinates and play "complete the square":

$\sum D_i z_i^2 + \gamma_i z_i \leq -k$

For those $D_i$ that are non-zero we get

$D_i z_i^2 + \gamma_i z_i=D_i (z_i-\frac{\gamma_i}{2D_i})^2-(\frac{\gamma_i}{2D_i})^2=D_i w_i^2+ n_i$

So if all $D_i$ are non-zero (i.e. if $\Sigma-\theta^T\theta$ is non-singular), we have the constraint rewritten as $\sum D_i w_i^2< N$ and if $N\neq 0$ the constraint is indeed $\sum d_i w_i^2\leq 1$ (where $d_i=D_i/N$).

The objective is $z^Tz=(w-w_0)^T(w-w_0)$ where $(w_0)_i=-\frac{\gamma_i}{2D_i}$.

So, overall we are optimizing distance from a point to a region defined by a quadratic - which can be somewhat standartized, and, depending on various parameter values, "often" completely diagonalized(in practice one would need to pay attention to various conditioning issues).

Related Question