[Math] unbiased estimator of sample variance using two samples

probabilitysamplingstatistics

I have a couple questions, I'm hoping someone can help!
Let $X_1…X_n$ is a random i.i.d. sample from a $N(\mu,\sigma^2)$ distribution, and $Y_1…Y_m$ is a random i.i.d. sample from a $N(2\mu,\sigma^2)$ distribution, and further let the two samples be independent (and the quantities $\mu$ and $\sigma^2$ be unknown).
I'm trying to do the following: construct an unbiased estimator of $\mu$ ($\hat{\mu}$) using both samples, calculate $Var(\hat{\mu})$, and then use both samples to obtain an unbiased estimator for $\sigma^2$.

I think I understand the first two parts: we know $\large E(\frac{X_1+…+X_n}{n}) = \mu$, and $\large E(\frac{Y_1+…+Y_m}{m}) = 2\mu$, so I believe $\large \frac{X_1+…+X_n}{2n}+\frac{Y_1+…+Y_m}{4m}$ should provide an unbiased estimator for $\mu$, and from that it

follows $\large Var(\hat{\mu})=\sigma^2(\frac{1}{4n}+\frac{1}{16m})$.

What I'm not clear on is how to construct an unbiased estimator for the variance. I'm aware that $\large\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2$ provides an unbiased estimator for $\sigma^2$ (the proof is on wikipedia). From this, it seems like $\large\frac{1}{2}\cdot\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2$+$\large\frac{1}{2}\cdot\frac{1}{n-1}\sum_{i=1}^m(Y_i-\bar{Y})^2$ would yield $\frac{\sigma}{2}+\frac{\sigma}{2}=\sigma$, but something about it makes me nervous, and I feel like is approach may be inherently flawed? Any help/suggestions would be greatly appreciated!
Thanks

Best Answer

Apart from the fact that it should be $m-1$ instead of $n-1$ in the right-hand denominator, your estimator for $\sigma^2$ looks fine. You can do slightly better on the variance of $\hat\mu$ (though the question didn't ask to optimize it): Consider a general convex combination

$$ \alpha\frac{X_1+\dotso+X_n}n+(1-\alpha)\frac{Y_1+\dotso+Y_m}{2m} $$

of the individual estimators for $\mu$. The variance of this combined estimator is

$$ n\left(\frac\alpha n\right)^2\sigma^2+m\left(\frac{1-\alpha}{2m}\right)^2\sigma^2=\left(\frac{\alpha^2}n+\frac{(1-\alpha)^2}{4m}\right)\sigma^2\;, $$

and minimizing this by setting the derivative with respect to $\alpha$ to zero leads to $\alpha=n/(n+4m)$, yielding the variance $\sigma^2/(n+4m)$. For $n=m$ the variance is $\frac15\sigma^2/n=0.2\sigma^2/n$, compared to $\frac5{16}\sigma^2/n\approx0.3\sigma^2/n$ for your estimator, and for $n$ fixed and $m\to\infty$ or vice versa, the variance of this estimator tends to zero whereas the variance of your estimator tends to a non-zero value.

You could optimize the variance of your unbiased variance estimator in a similar way, though the calculation would be a bit more involved.

Related Question