ANOVA – Estimating Population Variance from a Set of Means

anovapartitioningstandard deviationweighted mean

I have a set of measurements which is partitioned into M partitions. However, I only have the partition sizes $N_i$ and the means $\bar{x}_i$ from each partition. Because all measurements are assumed to be from the same distribution, I believe I can estimate the mean of the population, $\bar{y}$, and standard deviation of the mean, $\sigma_{mean}$:
$$
N=\sum_{i=1}^M N_i
$$
$$
\bar{y} = \frac{1}{N}\sum_{i=1}^MN_i\bar{x}_i
$$

$$
\sigma_{mean}=\sqrt{\frac{1}{N}\sum_i N_i(\bar{x}_i-\bar{y})^2}
$$

My questions:

  1. Am I right in my assumptions, that the mean $\bar{y}$ can be computed as above?
  2. How can I find the standard deviation for the population, given only the means?
    I read that the standard deviation of the population and standard deviation of the mean is related with
    $$
    \sigma_{mean}=\frac{\sigma}{\sqrt{n}} \mbox{[1]}
    $$
    where $n$ is the number of samples used in the computation of $\bar{x}_i$. So is it actually as simple as just multiplying $\sigma_{mean}$ with $\sqrt{n}$ if $n$ for all means are the same?
  3. If it's that simple, what do I do if each $\bar{x}_i$ is computed using a different number of samples?

[1] Wikipedia:Standard Deviation

Best Answer

Let $X_i$ be the mean of $N_i$ independent draws from some unknown distribution $F$ having mean $\mu$ and standard deviation $\sigma$. Altogether these values represent $N=N_1+N_2+\cdots+N_k$ draws. It follows from these assumptions that each $X_i$ has expectation $\mu$ and variance $\sigma^2/N_i$.

Part of the question proposes estimating $\mu$ from these data as

$$\hat{\mu} = \frac{1}{N}\sum_{i=1}^k N_i X_i.$$

We can verify that this is a good estimate. First, it is unbiased:

$$E[\hat{\mu}] = E\left[\frac{1}{N}\sum_{i=1}^k N_i X_i\right] = \frac{1}{N}\sum_{i=1}^k N_i \mu = \mu.$$

Second, its estimation variance is low. To compute this we find the second moment:

$$\begin{align} E\left[\hat{\mu}^2\right] &= E\left[\frac{1}{N^2}\sum_{i,j}N_i N_j X_i X_j\right]\\ &= \mu^2 + \sigma^2/N. \end{align}$$

Subtracting the square of the first moment shows that the sampling variance of $\hat{\mu}$ equals $\sigma^2/N$. This is as low as an unbiased linear estimator can possibly get, because it equals the sampling variance of the mean of the $N$ (unknown) values from which the $X_i$ were formed; that sampling variance is known to be minimum among all unbiased linear estimators; and any linear combination of the $X_i$ is a fortiori a linear combination of the $N$ underlying values.

To address the other parts of the question, let us seek an unbiased estimator of the variance $\sigma^2$ in the form of a weighted sample variance. Write the weights as $\omega_i$. Computing in a similar vein we obtain

$$\begin{align} E\left[\widehat{\sigma^2}\right] &= E\left[\sum_i \omega_i(X_i-\hat{\mu})^2\right] \\ &= \sum_i \omega_i E\left[X_i^2 - \frac{2}{N}\sum_j N_j X_i X_j + (\hat{\mu})^2\right]\\ &= \sum_i \omega_i \left((\mu^2 + \sigma^2 / N_i)\left(1 - 2\frac{N_i}{N}\right) - \frac{2}{N}\sum_{j\ne i} N_j \mu^2 + (\mu^2 + \sigma^2/N)\right)\\ &= \sigma^2 \sum_i \omega_i\left(\frac{1}{N_i} - \frac{1}{N}\right). \end{align}$$

A natural choice (inspired by ANOVA calculations) is $$\omega_i = \frac{N_i}{k-1}.\tag{*}$$ For indeed,

$$E\left[\widehat{\sigma^2}\right] = \sigma^2 \sum_i^k \frac{N_i}{k-1}\left(\frac{1}{N_i} - \frac{1}{N}\right) = \sigma^2 \frac{1}{k-1}\sum_i^k \left(1 - \frac{N_i}{N}\right) = \sigma^2\frac{k-\frac{N}{N}}{k-1} = \sigma^2.$$

This at least makes $\widehat{\sigma^2}$ unbiased. With more than $k=2$ groups, there are many other choices of weights that give unbiased estimators. When the group sizes are equal, it's easy to show that this choice gives a minimum-variance unbiased estimator. In general, though, it appears that the MVUE depends on the first four moments of $F$. (I may have done the algebra wrong, but I'm getting some complicated results for the general case.) Regardless, it appears that the weights provided here will not be far from optimal.

As a concrete example, suppose that each of $X_1$, $X_2$, and $X_3$ is the average of $N_i=4$ draws. Then $N=12$, $k=3$, and the weights as given in formula $(*)$ are all given by $\omega_i = \frac{4}{3-1}=2$. Consequently we should estimate $$\widehat{\sigma^2} = 2((X_1-\hat{\mu})^2 + (X_2-\hat{\mu})^2 + (X_3-\hat{\mu})^2)$$ and, of course, $$\hat{\mu} = \frac{1}{12}(4X_1 + 4X_2 + 4X_3) = (X_1+X_2+X_3)/3.$$

Related Question