Solved – Variance of a sample of random variables

hypothesis testingrandom variablesamplevariance

I have a sample of 100 items, each associated to a random variable for which I can compute expected value and variance: $X_1, X_2, …, X_{100}$. From these, we can define the mean $\overline{X}=\frac{1}{100}\sum{X_i}$. I'd like to test the hypothesis $H_0:\mu=0$, (where $\mu$ is the true population mean from which the 100 items were sampled) but for that I need the variance of the sample of $X_i$'s.

So the question is: how do I compute the variance of the sample, given the individual expectations and variances of the random variables?

Thanks a lot!

EDIT: More info about the question

I have a set of 100 items, and there is a function that assigns a score to each of them. The problem is that computing that function is actually very expensive, so instead I have a process with which I can estimate the score: the more effort (e.g. money) I put into the process, the better the estimate (less variance). So initially the variance of each estimate is maximum, it decreases as I put more effort into the process, and eventually variance is 0 and expectation equals the actual score if I run the process completely (i.e. the original function).

Those 100 items represent just a random sample of a wider population of items, so I'd like to test the hypothesis of the population score being different from (or larger than) zero.

Best Answer

This is a problem very often encountered in biology where they do a couple of independent experiments (100 in your case) sampled IID, each with their unknown own mean. The only thing they can do is estimate those means by again IID sampling. Typically, the variable $X_i$ is estimated by a sample of size $n_i$, so the variance will be $\sigma_i^2 = \sigma^2/n_i$, where $\sigma^2$ is the sampling variance, not the variance of $X$. Because each individual experiment can be written in the form $X_i + \varepsilon_{ij}$, the variance of that variable is $V + \sigma^2/n_i$, where $V$ is the variance of $X$.

You can compute the grand mean as $\bar{X} = \frac{1}{n}\sum_{i=1}^{100}n_i\bar{X_i}$, (where $n = \sum_{i=1}^{100}n_i$) which is a sum of the independent variables $\frac{n_i}{n}\bar{X_i}$. Their variance is $\frac{n_i^2}{n^2}V+\frac{n_i}{n^2}\sigma^2$, so by summing you get $\sum_{i=1}^{100}\frac{n_i^2}{n^2}V+\sigma^2/n$.

Related Question