Calculate a solution for matrix function

matricesmatrix equationsmatrix-calculussubgradientvector analysis

Assume that we have matrices $M\in \mathbb R^{tp\times tq}$, $H\in \mathbb R^{tq\times tq}$, $A \in \mathbb R^{a\times tq}$; vectors $z,z_{ref}\in \mathbb R^{tq}$, $y\in\mathbb R^{rp}$, $h\in \mathbb R^a$; scalars $\rho, b \in \mathbb R$.

Note that $\otimes$ is a Kronecker product, and $\odot$ is a Hadamard product (Schur product), $\|\cdot\|$ is 2-norm. $\mathbf 1_i$ is a $i$-dimensional all-one vector, and $I_{i}$ is a $i\times i$ identity matrix.

Define a function as
$$
\mathcal L(z) = (z-z_{ref})^T H(z-z_{ref})
+ \frac{\rho}{2} \left\| Az – h \right\|^2
– b \sum\ln \left(-\psi \left(z \right)\right) \in \mathbb R
$$

where
$$
\psi(z) =
d_{\text{safe}}\mathbf1_{tr} – \left( I_{tr}\otimes \mathbf 1_{p}^T \right) \left( I_{trp} \odot \left( \left(Mz\otimes \mathbf1_{r} – \mathbf1_{t\times 1}\otimes y\right)\mathbf1_{trp}^T \right) \right)
\left(Mz\otimes \mathbf1_{r} – \mathbf1_{t}\otimes y \right)
\in \mathbb R^{tr}.
$$

Here, $\sum \ln(-\psi(z))$ means the summation of all elements of $\ln(-\psi(z))$.
In this function, all of the variables are known, except for vector $z$.

The objective is to minimize $\mathcal L(z)$ with respect to $z$.

Now, I wonder how to calculate $\nabla_z \mathcal L(z)$. Is it possible to calculate $\nabla_z \mathcal L(z)$?

Or we can use another way to minimize this function?

And how to solve $z$ when $\nabla_z \mathcal L(z)=0$?

Best Answer

For consistency and typing convenience, define the following variables $$\eqalign{ m &= tr,\quad n = trp \\ \beta &= b,\quad\; \delta = d_{safe} \\ z_0 &= z_{ref} \\ Q &= (I_m\otimes {\tt1}_p) \\ x &= (Mz\otimes{\tt1}_r - {\tt1}_t\otimes y) \\ X &= {\rm Diag}(x) \\ w &= Q^T(I_n\odot x{\tt1}_n^T) -\delta{\tt1}_m \;\doteq\; (-\psi) \\ s &= \left(\frac{{\tt1}}{w}\right) \quad\implies\quad s\odot w = {\tt1}_m \\ }$$ From a previous question we know that $$\eqalign{ dw &= 2Q^TXQM\,dz\\ d\log(w) &= s\odot dw \\ &= 2s\odot Q^TXQM\,dz \\ }$$ Write the current function in terms of these new variable. Then calculate its gradient. $$\eqalign{ {\cal L} &= H:(z-z_0)(z-z_0)^T + \frac{\rho}{2}(Az-h):(Az-h) -\beta{\tt1}_m:\log(w) \\ d{\cal L} &= \left(H+H^T\right)(z-z_0):dz + \rho(Az-h):A\,dz - \beta{\tt1}:(2s\odot Q^TXQM\,dz)\\ &= \left(H+H^T\right)(z-z_0):dz + \rho(Az-h):A\,dz -2\beta s:Q^TXQM\,dz \\ &= \Big[\left(H+H^T\right)(z-z_0) + \rho A^T(Az-h) -2\beta M^TQ^TXQs\Big]:dz \\ \frac{\partial{\cal L}}{\partial z} &= \left(H+H^T\right)(z-z_0) + \rho A^T(Az-h) -2\beta M^TQ^TXQs \\ \\ }$$


In the above derivation, a colon is used to denote the trace/Frobenius product, i.e. $$\eqalign{ A:B &= {\rm Tr}(A^TB) = {\rm Tr}(AB^T) = B:A \\ }$$ The cyclic property of the trace allows products to be rearranged in various ways, e.g. $$A:BC \;=\; AC^T:B \;=\; B^TA:C \;=\; A^T:(BC)^T$$ In addition, the Frobenius and Hadamard products commute with each other. $$A:B\odot C = A\odot B:C$$
Related Question