[Math] unbiased estimator for sample covariance

estimationstatistics

I'm new to statistics and and I need some help:

Let $X_1,…X_n$~$N(\mu_x,\sigma^2)$, $Y_1,…Y_m$~$N(\mu_y,\sigma^2)$.
All r.vs. are i.i.d and $\mu_x,\mu_y,\sigma$ are unknown

I was told that $S_p^2=(S_x^2(n-1)+S_y^2(m-1))/(n+m-2)$ is an unbiased estimator for the sample covariance.

I think I'm confused over definitions because to my best of knowledge covariance cannot be calculated when $n\not=m$. Am I correct? if so what is the above actually an unbiased estimator of?

Best Answer

You are correct. If you cannot match up realizations from $X$ with realizations from $Y$, then it is impossible to estimate how $X$ and $Y$ vary together; i.e., their covariance. What is required to estimate covariance are pairs of realizations between the variables.

The estimator you cite is the pooled variance of two samples assumed to be drawn from distributions with possibly different means, but with the same variance. That is to say, your $S_p^2$ is an estimator of $\sigma^2$ if $X$ and $Y$ are independent and normally distributed with different means but the same variance. But if $X$ and $Y$ are marginal distributions from a bivariate normal with unknown mean vector $\boldsymbol \mu = (\mu_x, \mu_y)$ and covariance matrix $$\boldsymbol \Sigma = \begin{bmatrix} \sigma_x^2 & \sigma_{xy} \\ \sigma_{xy} & \sigma_y^2 \end{bmatrix},$$ where in your case we might assume $\sigma_x^2 = \sigma_y^2 = \sigma^2$, then $S_p^2$ does not in any way estimate $\sigma_{xy}$. In fact, such a sample cannot estimate the covariance for the reason given in the previous paragraph.

Related Question