Solved – Standard Deviation of Normal Model

distributionsnormal distributionstandard deviation

A normal model is defined by its mean and standard deviation. However, what precisely is meant by the parameter of its standard deviation?

I know how the standard deviation applies to discrete data sets, and intuitively for a probability distribution it would be the measure of "dispersion."

However, when discussing normal models one often refers to the percent of data lying within $X$ standard deviations from the mean. Shouldn't all the data lie within the normal model parameter of $\sigma$ though?

Consider for instance the normal distribution, which has associated with it a standard deviation of 1. How then is it valid to talk about values that are $2$ standard deviations from the mean, and why would only $95$ percent of data be contained within this range?

enter image description here

Best Answer

A normal model is defined by its mean and standard deviation. However, what precisely is meant by the parameter of its standard deviation?

I know how the standard deviation applies to discrete data sets, and intuitively for a probability distribution it would be the measure of "dispersion."

That intuition is correct.

I'll have to bring in a little mathematics here. However, I am fudging and handwaving a little because I am guessing you don't have much mathematics.

First, let's start with expectation ("mean"). The expectation of a discrete random variable looks very like the sample average (and indeed, they're more closely linked even than it might first look).

Assume you have a sample where each data value is different from the others. If you regard the sample as having a proportion $p_i=\frac{1}{n}$ at each data value, the sample average is $p_1x_1+p_2 x_2+...+p_n x_n=\sum_i p_i x_i$.

The same formula applies when there could be repeated values, but then the prortions ($p_i$) won't all be $\frac{1}{n}$ -- they might as easily be $\frac{3}{n}$ or $\frac{19}{n}$ etc. The formula still works. Now imagine that $n$ becomes very, very large -- tending off to infinity. If we have an infinite population of values rather than a sample, the expectation looks the same - the population mean, or expectation is $E(X)=\sum_i p_i x_i$ ($=\mu$, say), where the sum is over all possible values that the variable can take, even if it can take an infinite number of possible values (e.g. the number of tosses until you get a head doesn't have an upper limit)

Similarly, variance for a (discrete) population is the average squared deviation from the mean - $\sum_i p_i(x_i-\mu)^2$. (It's also half the average squared distance between pairs of random values.)

Population standard deviation then is the square root of the variance. it represents a kind of typical distance of values from the mean, typically a little larger than the ordinary average distance (25% larger in the case of the normal distribution). Some values will be more than the typical distance from the mean and some will be closer than that typical distance.

For a continuous distribution we replace the proportion of the population at each value with its density and we have to do the continuous equivalent of summing -- integration (we can no longer write a list of possible values, since between any two possible values there's usually an infinity of other possible values, each point has probability zero but we have positive chance of laying within an interval -- so we can't usefully talk about $P(X=3)$ but we can say something about the chance of being between 2 and 4, say $P(2\leq X \leq 4)$).

[If you've seen integration before, $\mu=E(X)=\int x f(x) dx$ where $f(x)$ is the density of the variable. Similarly $\sigma^2=\text{Var}(X)=\int (x-\mu)^2 f(x) dx$]

The density for a normal distribution is what you've drawn in your question. It indicates that an interval near $\mu$ has a higher probability that an interval of the same size not close to $\mu$. There's an explicit functional form for the density, $f(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2\sigma^2}(x-\mu)^2},\,-\infty<x<\infty$. Here $\mu$ is the population mean and $\sigma^2$ is the population variance ($\sigma$ is the population standard deviation). It turns out if you can evaluate the above integrals you do indeed get $\mu$ and $\sigma^2$ back out.

Now, for the normal distribution, there is (as indicated above) a nonzero probability of laying in some finite interval any distance from $\mu$ -- no matter how far away from $\mu$ you put your interval, there's always some chance of a value falling into it (see the modified version of your picture below -- even though it looks like there's no more density past about 2.5 standard deviations above the mean, there is, it's just so small it's hidden by the axis).

Just as with a sample, where (nearly always) there's some data further than one standard deviation from the mean, the same is true of probability distributions -- unless the distribution takes only two values with equal probability, there's always some of the distribution more than one standard deviation from the mean.

With a normal distribution, about one billionth of the population is 6 $\sigma$ above the mean (and the same below, because it's symmetric). That might sound trivial, but (for example) imagine measured IQ were actually normally (it isn't, quite, but it's about as near as we're likely to find for the purpose of this discussion). That would mean somewhere about 7 people on earth have IQs more than 6 standard deviations above the mean.

However, when discussing normal models one often refers to the percent of data lying within X standard deviations from the mean. Shouldn't all the data lie within the normal model parameter of σ though?

No, $\sigma$ doesn't represent the largest possible deviation from the mean (and for the normal distribution, there simply isn't a largest possible deviation) -- as mentioned before, $\sigma$ represents a kind of typical deviation from the mean:

![![enter image description here

Values that are $\sigma$ above the mean occur where the curve is decreasing most rapidly (the part where it's "straightest", roughly where I have marked)

Only about 68% of a normal population is within one standard deviation of the mean; about 16% is more than $1\sigma$ above the mean (and similarly on the low side).

Consider for instance the normal distribution, which has associated with it a standard deviation of 1. How then is it valid to talk about values that are 2 standard deviations from the mean, and why would only 95 percent of data be contained within this range?

I explained above that as the standard deviation is a typical distance from the mean ('typical' in the sense of a special kind of average a bit larger than the ordinary mean distance from the mean), we expect some values to be further from the mean than it.

As for why is 16% more than one standard deviation above the mean and why is 2.5% more than two standard deviations above it (i.e. why those values and not some other values) -- in essence ... that's just how it works out for the normal. Other distributions have other amounts outside those ranges.