Let us discuss the example you were given. Generally, this optimization method uses the following strategy. Let $f(x,y,z)$ be the function that we are attempting to determine the critical points for, subject to the constraint equation $$g(x,y,z)=k$$ for some $k \in \mathbb{R}$. We solve the following system:
$$\nabla f(x,y,z) = \lambda \nabla g(x,y,z) \\g(x,y,z)=k$$
of four equations and four unknowns (note that $\nabla$ is the gradient function which returns the vector composed of partial derivatives with respect to $x$, $y$, and $z$).
In this case, we have $f(x,y,z)=2x+y-2z$ and $g(x,y,z)=x^2+y^2+z^2=4$ (this is a sphere of radius $2$). Thus, we have the following system of equations:
$$\begin{cases}2 = 2\lambda x \,\,\,\,\,\,(f_x = \lambda g_x) \\ 1 = 2\lambda y \,\,\,\,\,\, (f_y=\lambda g_y)\\ -2 = 2\lambda z \,\,\,\,\,(f_z= \lambda g_z)\\ x^2+y^2+z^2=4\end{cases}$$
There are various ways that you can solve this, but we will solve in the following way. Multiplying the first equation by $yz$, the second equation by $xz$, and the third equation by $xy$ and setting each of these equal to one another, we obtain $$2\lambda xyz = \begin{cases} 2yz \\ xz \\ -2xy \end{cases}$$
So, first we have $x = 2y$ upon dividing $2yz=xz$ by $z \neq 0$. Then we also have $z=-2y$ upon dividing $xz=-2xy$ by $x \neq 0$. Finally, we have $x=-z$ upon dividing $2yz = -2xy$ by $2y \neq 0$. Applying this, we substitute for $x$ and $z$ in terms of $y$ into the fourth equation to get
$$x^2 +y^2 +z^2 =4 \implies 4y^2 + y^2 + 4y^2 = 9y^2 = 4 \implies y = \mp \frac{2}{3}$$
I will let you solve for the other $3$ unknowns (consider each case separately: assume $y = -\frac{2}{3}$ and solve for $x,z,\lambda$ and then assume $y=\frac{2}{3}$ and solve for $x,z,\lambda$). Recall from before that $z = -2y$ and $x=-z$. You will find the two solutions
$$(x,y,z,\lambda)=\left(\mp \frac{4}{3},\mp \frac{2}{3}, \pm \frac{4}{3},\mp \frac{3}{4}\right) .$$
These solutions $(x,y,z)$ are the critical points of the function $f$ under this constraint $g(x,y,z)=4$ and we can use multiple ways to classify them (as, for instance, maximums, minimums, or saddle points).
Best Answer
Rather than solve for the Cholesky factor directly, find a solution in terms of a less structured matrix, $M$. Let a colon denote the matrix inner product, i.e. $$\eqalign{ A:B &= {\rm Tr}(AB^T) \cr M:M &= {\rm Tr}(MM^T) &= \frac{1}{\mu^2} \cr }$$ Also, for typing convenience let $$\eqalign{ A &= \frac{MM^T}{M:M},\quad\; Y &= Y^T = \sum_kz_kz_k^T - w_kw_k^T \cr }$$ Calculate the gradient of $F$. $$\eqalign{ F &= Y:A \cr dF &= Y:dA \cr &= Y:\Bigg(\frac{dM\,M^T+M\,dM^T}{M:M} - \frac{MM^T\big(dM:M+M:dM\big)}{(M:M)^2}\Bigg) \cr &= 2\mu^2\Big(YM-FM\Big):dM \cr \frac{\partial F}{\partial M} &= 2\mu^2\big(YM-FM\big) \cr }$$ Set the gradient to zero and solve for $M$. $$\eqalign{ &YM = F M \cr }$$ Thus it appears that the columns of $M$ are equal to the eigenvector $\{v_k\}$ of $Y$ associated with its minimum eigenvalue $\{\lambda_k\}$, i.e. $$k = \arg\min_j \lambda_j,\quad F = \lambda_k,\quad M = (\,v_k\;v_k\;v_k\;\ldots\,) \,=\, v_k{\large\tt 1}^T $$
Given the solution in terms of $M$, recover a solution in terms of $L$. $$\eqalign{ L &= {\rm cholesky}\bigg(\frac{MM^T}{M:M}\bigg) \cr A &= LL^T = \frac{MM^T}{M:M},\quad\quad {\rm Tr}(LL^T) = \frac{M:M}{M:M} = 1 \cr }$$