[Math] Minimum variance of estimator

meansstatisticsvariance

Consider two processes, both have mean $\mu$. Meanwhile the variance of the first process is $\sigma^2$ (with sample size $n$), and the variance of the second process is $4\sigma ^2$ (with sample size m).
First I proved that $$\hat{\mu}=a\bar{X}+(1-a)\bar{Y}$$ is an unbiased estimator for $\mu$.
Now I want want to find the value of $a$ for which the value of the variance of $\hat{\mu}$ minimum.

My attempt:
I found that the variance of $\hat{\mu}$ is given by
$$a^2\frac{\sigma^2}{n}+(1-a)^2\frac{4\sigma^2}{m}$$
Next I want to minimize this by setting the derivative with respect to $a$ to $0$?

Best Answer

Correct. Once you find the critical point, it suffices to show it corresponds to a minimum by considering the second derivative at this value. You will also find that the minimum variance attained should be $$\frac{4\sigma^2}{m + 4n}.$$

Related Solutions

Probability Theory – Unbiased Estimator of Variance with Known Population Size

Let's assume you have a population of size $N$ with values $x_1,\ldots,x_N$, mean $\bar x=\frac{1}{N}\sum_{i=1}^N x_i$ and variance $\sigma^2=\frac{1}{N}\sum_{i=1}^N(x_i-\bar x)^2$. (Note that I use lower case $x_i$ to indicate these are not random, but fixed values.)

Now, let's take a random sample $Y_1,\ldots,Y_n$ of $n$ elements (without replacement), with all such subsets equally likely. (Now I use capital $Y$ to indicate these are random.)

Now, $\bar Y=\frac{1}{n}\sum_{i=1}^n Y_i$ and let $V=\sum_{i=1}^n (Y_i-\bar Y)^2$ so that the sample variance would be $V/n$ (like the expression for $\sigma^2$). If we write $V$ out in terms of $(Y_i-\bar x)^2$ and $(Y_i-\bar x)(Y_j-\bar x)$, we get $$ \begin{split} V =& \sum_{i=1}^n (Y_i-\bar Y)^2 = \sum_{i=1}^n \left[(Y_i-\bar x)-(\bar Y-\bar x)\right]^2 \\ =& \sum_{i=1}^n \left[(Y_i-\bar x)^2-2(Y_i-\bar x)(\bar Y-\bar x)+(\bar Y-\bar x)^2 \right] \\ =& \sum_{i=1}^n (Y_i-\bar x)^2 - n(\bar Y-\bar x)^2 \\ =& \left(1-\frac{1}{n} \right) \sum_{i=1}^n (Y_i-\bar x)^2 -\frac{2}{n}\sum_{1\le i<j\le n} (Y_i-\bar x)(Y_j-\bar x) \end{split} $$ where in the last step we use that $$ \left(\sum_{i=1}^n (Y_i-\bar x)\right)^2 = \sum_{i=1}^n (Y_i-\bar x)^2 + 2\sum_{1\le i<j\le n} (Y_i-\bar x)(Y_j-\bar x). $$

We know that $\text{E}[(Y_i-\bar x)^2]=\sigma^2$: this is just taking the average of $(Y_i-\bar x)^2$ for $Y_i$ sampled from $x_1,\ldots,N$.

For $i<j$, we can compute $\text{E}[(Y_i-\bar x)(Y_j-\bar x)]$ by using that this is the same as the average of $(x_i-\bar x)(x_j-\bar x)$ for all $1\le i<j\le N$. Since $\sum_{i=1}^N (x_i-\bar x)=0$, we get $$ 0 = \sum_{1\le i,j\le N} (x_i-\bar x)(x_j-\bar x) = \sum_{i=1}^N (x_i-\bar x)^2 + 2\sum_{1\le i<j\le N} (x_i-\bar x)(x_j-\bar x) $$ which for $i<j$ makes $$ \text{E}\left[(Y_i-\bar x)(Y_j-\bar x)\right] = -\frac{\sigma^2}{N-1}. $$ Combining these results, we get $$ \text{E}[V] = (n-1)\sigma^2 + \frac{n-1}{N-1}\sigma^2 = \frac{(n-1)N}{N-1}\sigma^2 $$ giving an unbiased estimator $$ \hat\sigma^2 = \frac{N-1}{N(n-1)}V = \frac{N-1}{N(n-1)} \sum_{i=1}^n (Y_i-\bar Y)^2. $$

As $N\rightarrow\infty$, you get the familiar $s^2$ estimator which corresponds to independent sampling from a distribution, while $n=N$ gives just $\sigma^2$ as it should when the $x_i$ are known for the whole population.

[Math] unbiased pool estimator of variance

First, your notation for the sample variance seems to be muddled. The sample variance is ordinarily defined as $S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar X)^2,$ which makes it an unbiased estimator of the population variance $\sigma^2.$

Perhaps the most common context for 'unbiased pooled estimator' of variance is for the 'pooled t test': Suppose you have two random samples $X_i$ of size $n$ and $Y_i$ of size $m$ from populations with the same variance $\sigma^2.$ Then the pooled estimator of $\sigma^2$ is

$$S_p^2 = \frac{(n-1)S_X^2 + (m-1)S_Y^2}{m+n-2}.$$

This estimator is unbiased.

Because one says the samples have respective 'degrees of freedom' $n-1$ and $m-1,$ one sometimes says the $S_p^2$ is a 'degrees-of-freedom' weighted average of the two sample variances. If $n = m,$ then $S_p^2 = 0.5S_x^2 + 0.5S_Y^2.$

Note: Some authors do define the sample variance as $\frac{1}{n}\sum_{i=1}^n (X_i - \bar X)^2,$ but then the sample variance is not an unbiased estimator of $\sigma^2,$ even though it might have other properties desirable for the author's task at hand. However, most agree that the notation $S^2$ is reserved for the version with $n-1$ in the denominator, unless a specific warning is given otherwise.

Example: One common measure of the 'goodness' of an estimator is that it have a small 'root mean squared error'. If $T$ is an estimate of $\tau$ then $\text{MSE}_T(\tau) = E[(T-\tau)^2]$ and RMSE is its square root.

The simulation below illustrates for normal data with $n = 5$ and $\sigma^2 = 10^2 = 100,$ that the version of the sample variance with $n$ in the denominator has smaller RMSE than the version with $n-1$ in the denominator. (A formal proof for $n > 1$ is not difficult.)

set.seed(1888);  m = 10^6;  n = 5;  sigma = 10;  sg.sq = 100
v.a = replicate(m, var(rnorm(n, 100, sigma)))  # denom n-1
v.b = (n-1)*v.a/n                              # denom n
mean(v.a); RMS.a = sqrt(mean((v.a-sg.sq)^2)); RMS.a
[1] 100.0564  # 70.81563
[1] 70.81563  # larger RMSE
mean(v.b); RMS.b = sqrt(mean((v.b-sg.sq)^2)); RMS.b
[1] 80.0451   # biased   
[1] 60.06415  # smaller RMSE

Best Answer

Related Solutions

Probability Theory – Unbiased Estimator of Variance with Known Population Size

[Math] unbiased pool estimator of variance

Related Question