[Math] Minimization with complex gradient descent

derivativesgradient descenthadamard-productmatricesmatrix-calculus

I want to compute the gradient of a function $f$ w.r.t to $\mathbf{d}$ which is a real-valued vector but there are also complex matrices and vectors involved. I would normally compute the gradient w.r.t. $\mathbf{d^{*}}$ to go towards the maximum change but since the vector is real it seems that the conjugate function does not play a role here. Anyway, something in my derivation of the gradient seems to be wrong because my matlab program is not converging to a solution. My program is ok, I have checked it several times. Hope you can help guys!

$ f =\mathbf{\|\Phi\|}_2^2$ with $f \in \mathbb{R}$ and $\Phi \in \mathbb{R}^{Nx1}$

$\mathbf{\Phi = y \odot y^{*}}$ where $\mathbf{y} \in \mathbb{C}^{Nx1}$

$\mathbf{y = AXw}$ with $\mathbf{A}$ and $\mathbf{X}$ are Hermitian of dimensions NxN

$\mathbf{w} \in \mathbb{C}^{Nx1}$ is formed by some function of the elements in vector $\mathbf{d} \in \mathbb{R}^{Mx1}$

The gradient $\nabla f$ w.r.t to $\mathbf{d}$ is

$\nabla f = \frac{\partial \mathbf{w}}{\partial \mathbf{d}} \frac{\partial \mathbf{y}}{\partial \mathbf{w}} \frac{\partial \mathbf{\Phi}}{\partial \mathbf{y}} \frac{\partial f}{\partial \mathbf{\Phi}}$

$\frac{\partial \mathbf{w}}{\partial \mathbf{d}} = \mathbf{Z}$, where $\mathbf{Z} \in \mathbb{C}^{MxN}$

$\frac{\partial \mathbf{y}}{\partial \mathbf{w}} = \mathbf{AX}$

$\frac{\partial \mathbf{\Phi}}{\partial \mathbf{y}} =diag(\mathbf{y^{*}})$

$\frac{\partial f}{\partial \mathbf{\Phi}} = 2\mathbf{\Phi}$

My update equation is $\mathbf{d = d} – u Real\{\nabla f\}$ in order to enforce the vector $\mathbf{d}$ remain real but the algorithm is not converging. Any mistakes that you could have detected here? Is my logic wrong? How could the function $f$ be minimized then?

Best Answer


For the vector $d$ let's use $e,\,$because $d$ is too easily confused with differential/derivative operators.
For the vector $\Phi$ let's use $p,\,$because it's easier to type.
I'll use $B$ in place of $(AX)$ for the same reason.

And instead of the chain rule, let's attack the problem with successive substitutions in differential expressions.

We're given the relationships $$\eqalign{ w &= Z^Te \cr y &= Bw \cr p &= y\odot y^* \cr P &= \operatorname{Diag}(p) \cr f &= p:p \cr }$$ where $\odot$ represents the Hadamard element-wise product, and $:$ the Frobenius inner product.

Now let's find the differential and gradient of $f$ $$\eqalign{ df &= 2p:dp \cr &= 2p:(y^*\odot dy + y\odot dy^*) \cr &= 2p\odot y^*:dy + 2p\odot y:dy^* \cr &= 2Py^*:dy + 2Py:dy^* \cr &= 2Py^*:Bdw + 2Py:B^*dw^* \cr &= 2B^TPy^*:dw + 2B^HPy:dw^* \cr &= 2B^TPy^*:Z^Tde + 2B^HPy:Z^Hde^* \cr &= 2(ZB^TPy^* + Z^*B^HPy):de \cr\cr \frac{\partial f}{\partial e} &= 2(ZB^TPy^* + Z^*B^HPy) \cr &= 4\,\operatorname{Re}(ZB^TPy^*) \cr }$$ Translating this back into your preferred notation $$\eqalign{ \frac{\partial f}{\partial d} &= 4\,\operatorname{Re}(ZX^TA^T(\Phi\odot y^*)) \cr\cr }$$ In the above, I made use of the fact that $(de^*=de),\,$since it's a real vector.

I also used the commutivity (and mutual commutivity) of the Hadamard and Frobenius products, i.e. $$\eqalign{ A:B &= B:A \cr A\odot B &= B\odot A \cr A:B\odot C &= A\odot B:C \cr }$$ and the mixed-product rules for Frobenius-Matrix products $$\eqalign{ A:BC &= B^TA:C \cr A:BC &= AC^T:B \cr }$$ which follow from the Frobenius-Trace equivalence $$A:B = \operatorname{tr}(A^TB)$$ and the cyclic properties of the trace.

Related Question