When dealing with such form of matrix multiplications always remember the Vectorization Trick with Kronecker Product for Matrix Equations:
$$ {x}_{i}^{T} W {x}_{i} - {y}_{i} \Rightarrow \left({x}_{i}^{T} \otimes {x}_{i}^{T} \right) \operatorname{Vec} \left( W \right) - \operatorname{Vec} \left( {y}_{i} \right) = \left({x}_{i}^{T} \otimes {x}_{i}^{T} \right) \operatorname{Vec} \left( W \right) - {y}_{i} $$
Since the problem is given by summing over $ {x}_{i} $ one could build the matrix:
$$ X = \begin{bmatrix} {x}_{1}^{T} \otimes {x}_{1}^{T} \\ {x}_{2}^{T} \otimes {x}_{2}^{T} \\ \vdots \\ {x}_{n}^{T} \otimes {x}_{n}^{T} \end{bmatrix} $$
Then:
$$ \arg \min_{W} \sum_{i = 1}^{n} {\left( {x}_{i}^{T} W {x}_{i} - {y}_{i} \right)}^{2} = \arg \min_{W} {\left\| X \operatorname{Vec} \left( W \right) - \boldsymbol{y} \right\|}_{2}^{2} $$
Where $ \boldsymbol{y} $ is the column vector composed by $ {y}_{i} $.
Now the above has nice form of regular Least Squares. The handling of the constraint can be done using Projected Gradient Descent Method. The projection onto the set of Symmetric Matrices and Positive Semi Definite (PSD) Matrices cone are given by:
- $ \operatorname{Proj}_{\mathcal{S}^{n}} \left( A \right) = \frac{1}{2} \left( A + {A}^{T} \right) $. See Orthogonal Projection of a Matrix onto the Set of Symmetric Matrices.
- $ \operatorname{Proj}_{\mathcal{S}_{+}^{n}} \left( A \right) = Q {\Lambda}_{+} {Q}^{T} $ where $ A = Q \Lambda {Q}^{T} $ is the eigen decomposition of $ A $ and $ {\Lambda}_{+} $ means we zero any negative values in $ \Lambda $. See Find the Matrix Projection of a Symmetric Matrix onto the set of Symmetric Positive Semi Definite (PSD) Matrices.
While the Symmetric Matrices Set is a Linear Sub Space the PSD Cone is not a Linear Sub Space. Hence the greedy iterative projection on the set is not guaranteed to converge to the orthogonal projection on the intersection of the 2 sets. See Orthogonal Projection onto the Intersection of Convex Sets. Yet it will converge to a feasible solution.
With all the tools above one could create his own solver using basic tools with no need for external libraries (Which might be slow or not scale).
I implemented the Projected Gradient Descent Method with the above projections in MATLAB. I compared results to CVX to validate the solution. This is the solution:
My implementation is vanilla Gradient Descent with constant Step Size and no acceleration.
If you add those you'll see convergence which is order of magnitude faster (I guess few tens of iterations). Not bad for hand made solver.
The MATLAB Code is accessible in my StackExchange Mathematics Q3619669 GitHub Repository.
Remark: The reason the greedy algorithm works is objective function is quadratic which means only the symmetric part of $\boldsymbol{W}$ has any effect. On top of that, if the initial matrix is symmetric then each step of the gradient step also generates a symmetric matrix, hence basically the only effective constraint is the definiteness of the matrix.
One possibility would be to try a penalty approach. For a suitably large value of $\lambda > 0$ you can try maximizing $h(x) = f(x) -\lambda g(x)^2$ over the hyperrectangle $[a, b].$ If the maximizer has a nonzero value for $g(x),$ ratchet up $\lambda$ and try again. Since $h$ is nondifferentiable, I would be inclined to try a method like Nelder-Mead.
Best Answer
The problem is quasi-convex in that it is convex for every fixed $a$ and the feasible set decreases for increasing $a$. Hence it can be solved by bisection in $a$
Example code on bisection to explain strategy https://yalmip.github.io/example/decayrate/
If you use MATLAB/YALMIP, you can do it very easily using built-in functionality (here using mosek as SDP solver)
I belive the problem is ill-posed though (unless $D$ has certain structure relative to $A$ and $C$) which allows $Y$ to grow arbitrarily large