[Math] Least Squares with Euclidean ($ {L}_{2} $) Norm Constraint

convex optimizationleast squareslinear algebranonlinear optimizationoptimization

Suppose I have set of samples $(x_i,y_i), 1 \leq i \leq n$. I am interested in solving the following optimization problem:
$$
\min \sum_{i=1}^n (y_i-a^\top x_i)^2, \quad \text{s.t } \|a\|_{2} = 1.
$$

If we assume that $\sum_i x_i x_i^\top$ is invertible, I am wondering if one can prove that the solution to the above optimization problem is
$$
a^\ast=\frac{\left(\sum_i x_i x_i^\top \right)^{-1} (\sum_i x_i y_i)}{\|\left(\sum_i x_i x_i^\top \right)^{-1} (\sum_i x_i y_i)\|}
$$
Does the above solution still hold if we relax the constraint to be $\|a\|_{2} \leq 1$? Assuming that the above solution does not hold, in this inequality constrained case, does running a projected gradient descent guaranteed to find the true minimum since the problem is convex?

Best Answer

I will try to fully solve it later but just think of the following case, what if the Least Squares solution already have an $ {L}_{2} $ norm which is less then 1?

Your solution will scale it and probably will make not the optimal.

So the solution must be composed of 2 cases, 1 if the LS solution obeys the relaxed constraint and the other not.

By the way, your solution, which is basically projection onto the Euclidean Unit Ball, can be the projection step in numerical solution using the Projected Gradient Method.

My Solution

First, let's rewrite the problem as:

$$ \begin{alignat*}{3} \text{minimize} & \quad & \frac{1}{2} \left\| A x - b \right\|_{2}^{2} \\ \text{subject to} & \quad & {x}^{T} x \leq 1 \end{alignat*} $$

The Lagrangian is given by:

$$ L \left( x, \lambda \right) = \frac{1}{2} \left\| A x - b \right\|_{2}^{2} + \frac{\lambda}{2} \left( {x}^{T} x - 1 \right) $$

The KKT Conditions are given by:

$$ \begin{align*} \nabla L \left( x, \lambda \right) = {A}^{T} \left( A x - b \right) + \lambda x & = 0 && \text{(1) Stationary Point} \\ \lambda \left( {x}^{T} x - 1 \right) & = 0 && \text{(2) Slackness} \\ {x}^{T} x & \leq 1 && \text{(3) Primal Feasibility} \\ \lambda & \geq 0 && \text{(4) Dual Feasibility} \end{align*} $$

From (1) one could see that the optimal solution is given by:

$$ \hat{x} = {\left( {A}^{T} A + \lambda I \right)}^{-1} {A}^{T} b $$

Which is basically the solution for Tikhonov Regularization of the Least Squares problem.

Now, from (2) if $ \lambda = 0 $ it means $ {x}^{T} x = 1 $ namely $ \left\| {\left( {A}^{T} A \right)}^{-1} {A}^{T} b \right\|_{2} = 1 $.

So one need to check the Least Squares solution first.
If $ \left\| {\left( {A}^{T} A \right)}^{-1} {A}^{T} b \right\|_{2} \leq 1 $ then $ \hat{x} = {\left( {A}^{T} A \right)}^{-1} {A}^{T} b $.

Otherwise, one need to find the optimal $ \hat{\lambda} $ such that $ \left\| {\left( {A}^{T} A + \lambda I \right)}^{-1} {A}^{T} b \right\| = 1 $.

For $ \lambda \geq 0 $ the function:

$$ f \left( \lambda \right) = \left\| {\left( {A}^{T} A + \lambda I \right)}^{-1} {A}^{T} b \right\| $$

Is monotonically descending and bounded below by $ 0 $.

enter image description here

Hence, all needed is to find the optimal value by any method by starting at $ 0 $.

Basically the methods is solving iteratively Tikhonov Regularized Least Squares problem.

A demo code + solver can be found in my StackExchange Mathematics Q2399321 GitHub Repository.