Solved – Why can the covariance matrix be computed as $\frac{X X^T}{n-1}$

computational-statisticscovariancecovariance-matrixdescriptive statistics

Wikipedia defines the covariance matrix as a generalized version of the variance: $C=E[(X-E[X])(X-E[X])^T]$. But, I usually see statisticians calculate the covariance matrix as $C=\frac{X X^T}{n-1}$ (e.g., Ameoba's nice answer here https://stats.stackexchange.com/a/134283/83526). Is $E[(X-E[X])(X-E[X])^T] = \frac{X X^T}{n-1}$? Are these both correct definitions of the covariance matrix?

Best Answer

Let $\mu = E(X)$. Then $$Var(X) = E\left((X - \mu)(X - \mu)^T\right) = E\left(XX^T - \mu X^T - X \mu^T + \mu \mu^T\right) \\ = E(XX^T) - \mu\mu^T$$ which generalizes the well-known scalar equality $Var(Z) = E(Z^2) - E(Z)^2$.

The natural estimator of $\Sigma := Var(X)$ is $\hat \Sigma = \frac 1{n-1}XX^T - \hat \mu \hat \mu^T$.

In many situations we can take $\mu = 0$ without any loss of generality. One common example is PCA. If we center our columns then we find that $\hat \mu = 0$ so our estimate of the variance is simply $\frac 1{n-1}XX^T$. The univariate analogue of this is the familiar $s^2 = \frac 1{n-1} \sum_i x_i^2$ when $\bar x = 0$.

As @Christoph Hanck points out in the comments, you need to distinguish between estimates and parameters here. There is only one definition of $\Sigma$, namely $E((X - \mu)(X - \mu)^T)$. So $\frac 1{n-1}XX^T$ is absolutely not the correct definition of the population covariance, but if $\mu=0$ it is an unbiased estimate for it, i.e. $Var(X) = E(\frac 1{n-1}XX^T)$.

Related Question