[Math] Trace of a Matrix: when to use? what is trace trick

linear algebramatricesmatrix-calculusmaximum likelihoodstatistics

On calculating log-likelihood function for some multivariate distributions, such as multivariate Normal, I see some examples where the matrices are suddenly changed to trace, even when the matrix is not diagonal. I searched online to find a plausible explanation for this "trace trick" without success. What is it all about?

Can someone clarify the usage of trace in this situation?

Bellow a slide with an example where you can find this usage of trace.

enter image description here

Best Answer

The trace is invariant under cyclic permutations. This means $\text{Tr}(\mathbf{ABC}) = \text{Tr}(\mathbf{CAB}) = \text{Tr}(\mathbf{BCA})$. The terms of form $(\mathbf{x_n-\mu})^T\Sigma^{-1}(\mathbf{x_n-\mu})$ are scalars (or, if you like, $1\times1$ matrices). The trace of a scalar is just the scalar. Note also that the trace is also linear, so $\text{Tr}(\alpha\mathbf{A}+\beta\mathbf{B}) = \alpha\text{Tr}(\mathbf{A}) + \beta\text{Tr}(\mathbf{B})$, which they use right underneath where you circled. This trick is used a lot, especially when one encounters quadratic forms (i.e. $\mathbf{x}^T\mathbf{Qx}$, where $\mathbf{Q}$ is symmetric).

All they do is replace "$\text{scalar}$" with "$\text{Tr}(\text{scalar})$", and then apply the cyclic permutation property. $(\mathbf{x_n -\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x_n-\mu}) = \text{Tr}((\mathbf{x_n -\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x_n-\mu}))$ because $(\mathbf{x_n -\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x_n-\mu})$ is a just a scalar. $\text{Tr}((\mathbf{x_n -\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x_n-\mu})) = \text{Tr}(\mathbf{\Sigma}^{-1}(\mathbf{x_n-\mu})(\mathbf{x_n -\mu})^T)$ by the permutation property I mentioned.