Statistics – Unbiased Estimate of Covariance

covariancestatistics

How can I prove that
$$
\frac 1 {n-1} \sum_{i=1}^n (X_i – \bar X)(Y_i-\bar Y)
$$
is an unbiased estimate of the covariance $\operatorname{Cov}(X, Y)$
where $\bar X = \dfrac 1 n \sum_{i=1}^n X_i$ and $\bar Y = \dfrac 1 n \sum_{i=1}^n Y_i$ and $(X_1, Y_1), \ldots ,(X_n, Y_n)$ an independent sample from random vector $(X, Y)$?

Best Answer

Additional Comment, after some thought, following an exchange of Comments with @MichaelHardy:

His answer closely parallels the usual demonstration that $E(S^2) = \sigma^2$ and is easy to follow. However, the proof below, in abbreviated notation I hope is not too cryptic, may be more direct.

$$(n-1)S_{xy} = \sum(X_i-\bar X)(Y_i - \bar Y) = \sum X_i Y_i -n\bar X \bar Y = \sum X_i Y_i - \frac{1}{n}\sum X_i \sum Y_i.$$

Hence,

$$(n-1)E(S_{xy}) = E\left(\sum X_i Y_i\right) - \frac{1}{n}E\left(\sum X_i \sum Y_i\right)\\ = n\mu_{xy} - \frac{1}{n}[n\mu_{xy} + n(n-1)\mu_x \mu_y]\\ = (n-1)[\mu_{xy}-\mu_x\mu_y] = (n-1)\sigma_{xy},$$

So the expectation of the sample covariance $S_{xy}$ is the population covariance $\sigma_{xy} = \operatorname{Cov}(X,Y),$ as claimed.

Note that $\operatorname{E}(\sum X_i \sum Y_i)$ has $n^2$ terms, among which $\operatorname{E}(X_iY_i) = \mu_{xy}$ and $\operatorname{E}(X_iY_j) = \mu_x\mu_y.$

Related Question