Solved – Variance of a sample covariance for normal variables

covarianceestimationmultivariate normal distribution

In normally distributed populations, sample variance $s^2$ follows chi-squared distribution and the variance of this estimator is expressed by:

$$\text{Var}(s^2) = \frac{2\sigma^4}{n-1}$$

I am wondering what is the generalisation of this result to covariances.

In particular, I am looking at vector $\mathbf{X}=[X_1, X_2]^T$ with distribution $N(\mathbf{0}, \mathbf{\Sigma})$. Given $\mathbf{Q}$ is sample covariance
$$\mathbf{Q} = \frac{1}{n-1}\sum_{i=1}^n \mathbf{x}_i\mathbf{x}_i^T$$
what is the distribution of $\mathbf{Q}$ and $\text{Var}(\mathbf{Q})$?

Best Answer

The OP is interested in Var(sample covariances) in a bivariate Normal world.

  • You know the solution for: Var(sample variances) (main diagonal), so ...
  • All that is needed is the solution for: Var(sample covariance)

Then there is no need for any matrix notation whatsoever, and if I understand correctly, the question reduces to:

Question: Let $(X,Y)$ be bivariate Normal. Given a sample of size $n$, namely $\big( (X_1, Y_1), \dots, (X_n, Y_n) \big)$, find $\text{Var}(W)$, where the sample covariance $W$ is defined as:

$$W = \frac 1 {n-1} \sum_{i=1}^n (X_i - \bar X)(Y_i-\bar Y). $$

Answer: This problem can be solved much more generally. We will obtain a general solution to $\text{Var}(W)$ for any distribution whose moments exist, and then solve for the Normal as a special case. The modus operandi for solving such problems is to work with power sum notation, which in our bivariate world is of form $s_{r,t}$, namely:

$$s_{r,t}=\sum _{i=1}^n X_i^r Y_i^t$$

In particular, sample covariance $W$ can be written in power sum notation as:

enter image description here

We seek $\text{Var}(W)$. Since the variance operator is the $2^\text{nd}$ Central Moment of $W$, we can find the variance using the mathStatica (for Mathematica) package function :

enter image description here

where:

  • $\mu _{r,s}$ denotes the product central moment:

$$\mu _{r,s}=E\left[(X-E[X]]^r (Y-E[Y])^s\right]$$

For example, $\mu_{1,1} = \text{Cov}(X,Y)$, $\mu_{2,0}= \text{Var}(X)$ and $\mu_{0,2}= \text{Var}(Y)$.

This is a general solution for any bivariate distribution whose moments exist.

Special case of bivariate Normal

In the case of a bivariate Normal with variance-covariance:

$$\Sigma = \left( \begin{array}{cc} \sigma _{11}^2 & \rho \sigma _{11} \sigma _{22} \\ \rho \sigma _{11} \sigma _{22} & \sigma _{22}^2 \\ \end{array} \right)$$

... the various central moments $\mu_{i,j}$ are:

enter image description here

Substituting in these values into the general solution sol yields $\text{Var}(W)$ in the bivariate Normal case as:

enter image description here

I am not sure if this is the same as the OP's posted solution of: $\frac{1}{n-1}(\sigma_{ij}^2 + \sigma_{ii}\sigma_{jj})$, as the notation he intended for $\sigma_{ij}$ is not tied down and appears inconsistent with that used by the OP for the diagonal cases, which the OP states correctly as: $\text{Var}(s^2) = \frac{2\sigma^4}{n-1}$

In summary, for the Normal case, the Variance of sample covariances is:

$$ \text{Var} \left( \begin{array}{cc} \frac 1 {n-1} \sum_{i=1}^n (X_i - \bar X)^2 & \frac 1 {n-1} \sum_{i=1}^n (X_i - \bar X)(Y_i-\bar Y) \\ \frac 1 {n-1} \sum_{i=1}^n (X_i - \bar X)(Y_i-\bar Y) & \frac 1 {n-1} \sum_{i=1}^n (Y_i-\bar Y)^2 \\ \end{array} \right)$$

$$= \frac{1}{n-1}\left( \begin{array}{cc} 2 \sigma _{11}^4 & \left(1+\rho ^2\right) \sigma _{11}^2 \sigma _{22}^2 \\ \left(1+\rho ^2\right) \sigma _{11}^2 \sigma _{22}^2 & 2 \sigma _{22}^4 \\ \end{array} \right)$$

Related Question