Standard Error Formula – Why Is the Formula for Standard Error the Way It Is?

intuitionstandard error

So just "why" is $SE = \frac{s}{\sqrt n}$ ? How should one interpret/articulate the reason of having $\sqrt n$ in the denominator. Why do we divide sample mean by the square root of the sample size, intuitively speaking? And how/why is it called standard "error".
(Question equally applicable for true standard deviation of the population: $\frac{\sigma}{\sqrt n}$)

Is there an intuitive derivation of $SE$ that can make this clear?

Please assume you are explaining it to a 6 year old who understands mean and sample size 🙂

Best Answer

This comes from the fact that $\newcommand{\Var}{\operatorname{Var}}\newcommand{\Cov}{\operatorname{Cov}}\Var(X+Y) = \Var(X) + \Var(Y) + 2\cdot\Cov(X,Y)$ and for a constant $a$, $\Var( a X ) = a^2 \Var(X)$.

Since we are assuming that the individual observations are independent the $\Cov(X,Y)$ term is $0$ and since we assume that the observations are identically distributed all the variances are $\sigma^2$. So

$\Var( \frac{1}{n} \sum X_i ) = \frac{1}{n^2} \sum \Var(X_i) = \frac{1}{n^2} \times \sum_{i=1}^n \sigma^2= \frac{n}{n^2} \sigma^2 = \frac{\sigma^2}{n}$

And when we take the square root of that (because it is harder to think on the variance scale) we get $\dfrac{\sigma}{\sqrt{n}}$.

More intuitively, think of 2 statistics classes: in the first the teacher assigns each of the students to draw a sample of size 10 from a set of tiles with numbers on them (the teacher knows the true mean of this population, but the students don't) and compute the mean of their sample. The second teacher assigns each of his/her students to take samples of size 100 from the same set of tiles and compute the mean. Would you expect every sample mean to exactly match the population mean? or to vary about it? Would you expect the spread of the sample means to be the same in both classes? or would the 2nd class tend to be closer to the population? That's why it makes sense to divide by a function of the sample size. The square root means we have a law of diminishing returns, to halve the standard error you need to quadruple the sample size.

As for the name, the full name is "The estimated standard deviation of the sampling distribution of x-bar"; it only takes saying that a few times before you appreciate having a shortened form. I don't know who first substituted "error" for "deviation" this way, but it stuck. The standard deviation measures variability of individual observations; the standard error measures variability in estimates of parameters (based on observations).

Related Question