[Math] Why does the normalized z-score introduce a square root? (And some more confusion)

statistics

I am having a little trouble getting answers from the articles I have been reading.

Referring to the "Standardizing in mathematical statistics" section here.

Why does the substitution of $x$ with $\bar{x}$ result in a square root multiplier. Where does this come from? What does it do?

When calculating the z-score using means, why must the formula be $z=\frac{\bar{x}-\mu}{SE}$ instead of $z=\frac{\bar{x}-\mu}{\sigma}$? (where $\sigma$ is standard deviation and $SE$ is $\sigma/\sqrt{n}$.) Why must the units be in standard error rather than in standard deviation as used with non-mean values in the second section of the article? Must this always be the case when using $\bar{x}$ instead of $x$?

Are $SE$ and $\sigma$ always based on the entire population or on the sample itself?

What does the narrator mean when he says "we divide that by the standard deviation of the sampling distribution" here @1:38. Does he mean the standard deviation of the sample or the standard deviation of the entire population?

Best Answer

If $X$ is a normal random variable, you can record an observation of it, $x$, and compare it to the mean. The usual way to do this is to standardize the variable, i.e.,

$$z = \frac{x - \mu}{\sigma}$$

Let's say that $X_1, X_2, \ldots X_n$ are random variables from the same distribution as $X$ above. If we record observations of each and calculate the mean, that's also a random variable. However, we can't expect the mean, $\overline{X}$, our new random variable to have the same distribution as our original distribution. It will have the same mean, but it won't have the same variance.

Think of it this way: Make $n$ larger and larger — record more and more observations. It seems that after a while, the mean of all those observations will be the same as the mean from the population. To make it a bit more concrete: Flip a coin a few times, letting $X = 1$ for heads and $X = 0$ for tails. Will your mean be $0.5$? Probably not. Flip it a few more times. Maybe you're a bit closer to $0.5$. By the time you flip the coin, say, a few thousand times, you'll probably be very close to $0.5$.

In other words, when we record a sample mean, making many observations, in the long term, restricts how far we can stray from the true mean. This is shown by the fact that

$$\text{Var}(\overline{X}) = \frac{\sigma^2}{n}$$

Note that as $n \rightarrow \infty$, $\text{Var}(\overline{X}) \rightarrow 0$. This is the gist of the Central Limit Theorem.

The Central Limit Theorem tells us that, regardless of the distribution of a random variable, as we take larger and larger samples, the distribution of the sampling mean (i.e., of $\overline{X}$) is normally distributed, and so we can use all the convenient properties of the normal distribution (like the standardized form). So when we write

$$z = \frac{\overline{x}-\mu}{\sigma / \sqrt{n}}$$

it's really the same thing as the earlier $z$: It's the difference of an observation and an expected value divided by the standard deviation of whatever distribution the observation came from. The new standard deviation (the standard error) is derived from the old one, but that's because the new distribution is derived from the old one.

Remember, $\sigma^2$ stands for population variance. So regardless of whether we're looking at a sample of size n or just one observation, it's always the population variance. $σ^2/n$ is the variance of the sample mean in terms of the population variance. (And, of course, $σ/\sqrt{n}$ is the square root of the variance of the sample mean.)

Best Answer

Related Solutions

[Math] Why does the central limit theorem imply that the standard deviation approaches $\frac{\sigma}{\sqrt{n}}$

[Math] Why does a confidence interval go to 0 if sample size is increased to population size

Related Question