[Math] How to calculate the gradient of matrix equation

matricesmatrix-calculusmultivariable-calculus

Short question: How do I calculate the gradient of the $MSE(a, b)$ equation below?


Longer explanation: This problem arises, while I'm following a derivation of a term for an optimal beamvector $a$ in a data transmission. The mean square error (MSE) of this data transmission is calculated as follows:

$$MSE(a, b) = a^H(Hbb^HH^H+R_n)a + 1 – a^HHb – b^HH^Ha$$

where:

  • $a$, $b$: vectors, which can be chosen
  • $H$, $R_n$: matrices, which are fixed
  • $a^H$: denotes the Hermitian adjoint of $a$

The vector $a$ can be optimized (in dependece of $b$) by setting the gradient of the MSE to zero.

The problem is that I don't know how to calculate the gradient when the equation has the above form. The $a^H$ at the beginning and the $a$ at the end of the first summand irritates me…

The answer shall be:

$$ a^* = (Hbb^HH^H+R_n)^{-1}Hb = R_n^{-1}Hb\frac{1}{1+b^HH^HR_n^{-1}Hb}$$

But how to calculate this?


Update:

Using equations from The Matrix Cookbook I got this far:

$$\frac{\partial MSE(a, b)}{\partial a} = \frac{\partial}{\partial a} \left[ a^H\left(Hbb^HH^H+R_n\right)a\right] + \frac{\partial}{\partial a} 1 – \frac{\partial}{\partial a} \left[a^HHb\right] – \frac{\partial}{\partial a} \left[b^HH^Ha\right]$$

With

  • $\frac{\partial}{\partial a} 1 = 0$
  • $\frac{\partial b^TX^TDXx}{\partial X} = D^TXbc^T + DXcb^T$ (Cookbook (74))

I get:

$$\frac{\partial MSE(a, b)}{\partial a} = (Hbb^HH^H+R_n)^Ha + (Hbb^HH^H+R_n)a – \frac{\partial}{\partial a} \left[a^HHb\right] – \frac{\partial}{\partial a} \left[b^HH^Ha\right]$$

And that's it. I don't even know if I used equation (74) from the cookbook right, but it was the closed equation for the first summand. I'm sorry, I just don't get it…

Best Answer

I'm not sure whether the following results hold for complex cases.

Let all the vectors and matrices be real valued. Then $$A=a^TBa+1-a^THb-b^TH^Ta$$ where $B=Hbb^TH^T+R_n$. $B$ is symmetric if $R_n$ is symmetric. Then $$dA=da^TBa+a^TBda-da^THb-b^TH^Tda$$ Let the gradient be zero. $$a^T(B^T+B)-2b^TH^T=0$$ If $B$ is symmetric, we have $2Ba=2Hb$ which implies $$a^*=B^{-1}Hb=(Hbb^TH^T+R_n)^{-1}Hb$$

But for the rest of your expected answer, I'm not sure. Because $$(cc^T+R_n)^{-1}c=\frac{R_n^{-1}c}{c^TR_n^{-1}c+1}, c=Hb$$ implies $$cc^TR_n^{-1}=c^TR_n^{-1}cI$$ Take trace on both sides of the above equation. The equation holds only when dimension is one.

Related Question