[Math] Probability of a sample mean taking various values

probabilitystatistics

How do you determine the probability of a sample mean being a certain value?

To make this more concrete, suppose we have a population of 1000 values. the population mean is 50 and the standard deviation is 5.

How do we determine the probability of 20 random individuals from that population having a sample mean of 40, and a standard deviation of 6. What about the probability of them all being less than 40?

No, it's not homework. But I imagine this is a sort of common thing to solve, so if anyone could just point me in the direction of an online resource that talks about the relevant topic, that would be great.

Best Answer

Say you select samples randomly from the same population. The sampling mean x^ is the random variable (a value obtained from the random process of selection)defined as the mean of the values of a given sample. The CLT then says that (among other things) if you take a large-enough number of random samples, all of the same size N, and for each sample s with values $x_s1,x_s2,....,x_sN $ , you calculate:

$x_s$:=$\frac{x_s1+....+xs_N}{N}$

That the collection of all these values of x^:=sampling mean has a normal distribution with mean equal to the population mean, and has a standard deviation equal to the population standard deviation divided by $ n^{1/2}$.

Then the values of $x_s$ will be (are) normally-distributed, with mean $\mu_s$= $\mu_{pop}$ , where $\mu_{pop}$ is the true population mean, i.e., the value you would get if you were to sample every single member of your population (but you can tell that doing this is often impractical and/or too costly) ,and then divide by the size of the population.

Say, now (assuming we don't know the true value of the population mean, otherwise no point in collecting sample data), you collect a sample s' of size N from your population, and you get a sample mean $x_s'$. Now, unless you know in advance the value of $\mu$ , there is no way of knowing whether it equals $x_s'$; the best you can do is to use the CLT to determine an interval centered at $x_s'$ that contains the value of $\mu$, with a certain probability. For this, you use the CLT: specifically, the value of $x_s'$ will be a certain number k of deviations from the mean. Now, using the fact that x^ is normally-distributed, you use the probability of obtaining a value that is k deviations from the mean, and this gives you the interval. Let me exapand on this a little later.