Statistics – ‘Trace Trick’ for Expectations of Quadratic Forms

expected valuematricesstatisticstrace

I am trying to understand the proof for the Kullback-Leibler divergence between two multivariate normal distributions. On the way, a sort of trace trick is applied for the expectation of the quadratic form $$E[ (x-\mu)^T \Sigma^{-1} (x-\mu) ]= \operatorname{trace}(E[(x-\mu)(x-\mu)^T)] \Sigma^{-1}),$$

where $x$ is MV-normal with mean $\mu$ and covariance matrix $\Sigma$. The expectation is taken over $x$.

I would like to understand why this identity holds. I think more than one step is taken at once. I believe, $\operatorname{trace}(E[(x-\mu)(x-\mu)^T] \Sigma^{-1})$ = $\operatorname{trace}(E[(x-\mu) \Sigma^{-1} (x-\mu)^T])$, but where does the trace come from?

Best Answer

Where does the trace come from?

A real number can be thought of as a $1 \times 1$ matrix, and its trace is itself. Thus $$(x-\mu)^\top \Sigma^{-1} (x-\mu) = \operatorname{tr}\left((x-\mu)^\top \Sigma^{-1} (x-\mu)\right)$$

More than one step is taken at once.

After applying the above step, use the cyclic property of the trace to obtain $$\operatorname{tr}\left((x-\mu)^\top \Sigma^{-1} (x-\mu)\right) = \operatorname{tr}\left((x-\mu)(x-\mu)^\top \Sigma^{-1} \right)$$ By linearity of the trace operator, you can push the expectation inside $$E \operatorname{tr}\left((x-\mu)(x-\mu)^\top \Sigma^{-1} \right) = \operatorname{tr}\left(E\left[(x-\mu)(x-\mu)^\top\right] \Sigma^{-1} \right).$$

Related Question