Solved – Understanding variance and standard deviation

standard deviationvariance

I get that the variance of a random variable X is the expected value of the squared deviation from its mean, and the standard deviation is just the square root of that. Can you interpret the standard deviation as the absolute average deviation from the mean? Is the variance defined as such because if we defined the standard deviation as the expected value of X – E(X), we would get 0?

Best Answer

The average deviation from the mean is $0$, that is $E[X-\mu]=0$, and so, taken literally, the absolute average deviation is also $0$. Changing your question slightly to

Can you interpret the standard deviation as the average absolute deviation from the mean?

No, $E[|X-\mu|]$ is not the standard deviation, that is, $$\sigma = \sqrt{E[(X-\mu)^2]} \neq E[|X-\mu|].$$

Related Solutions

Solved – Why square the difference instead of taking the absolute value in standard deviation

If the goal of the standard deviation is to summarise the spread of a symmetrical data set (i.e. in general how far each datum is from the mean), then we need a good method of defining how to measure that spread.

The benefits of squaring include:

Squaring always gives a non-negative value, so the sum will always be zero or higher.
Squaring emphasizes larger differences, a feature that turns out to be both good and bad (think of the effect outliers have).

Squaring however does have a problem as a measure of spread and that is that the units are all squared, whereas we might prefer the spread to be in the same units as the original data (think of squared pounds, squared dollars, or squared apples). Hence the square root allows us to return to the original units.

I suppose you could say that absolute difference assigns equal weight to the spread of data whereas squaring emphasises the extremes. Technically though, as others have pointed out, squaring makes the algebra much easier to work with and offers properties that the absolute method does not (for example, the variance is equal to the expected value of the square of the distribution minus the square of the mean of the distribution)

It is important to note however that there's no reason you couldn't take the absolute difference if that is your preference on how you wish to view 'spread' (sort of how some people see 5% as some magical threshold for $p$-values, when in fact it is situation dependent). Indeed, there are in fact several competing methods for measuring spread.

My view is to use the squared values because I like to think of how it relates to the Pythagorean Theorem of Statistics: $c = \sqrt{a^2 + b^2}$ …this also helps me remember that when working with independent random variables, variances add, standard deviations don't. But that's just my personal subjective preference which I mostly only use as a memory aid, feel free to ignore this paragraph.

An interesting analysis can be read here:

Revisiting a 90-year-old debate: the advantages of the mean deviation - Stephen Gorard (Department of Educational Studies, University of York); Paper presented at the British Educational Research Association Annual Conference, University of Manchester, 16-18 September 2004

Solved – Intuition behind standard deviation

My intuition is that the standard deviation is: a measure of spread of the data.

You have a good point that whether it is wide, or tight depends on what our underlying assumption is for the distribution of the data.

Caveat: A measure of spread is most helpful when the distribution of your data is symmetric around the mean and has a variance relatively close to that of the Normal distribution. (This means that it is approximately Normal.)

In the case where data is approximately Normal, the standard deviation has a canonical interpretation:

Region: Sample mean +/- 1 standard deviation, contains roughly 68% of the data
Region: Sample mean +/- 2 standard deviation, contains roughly 95% of the data
Region: Sample mean +/- 3 standard deviation, contains roughly 99% of the data

(see first graphic in Wiki)

This means that if we know the population mean is 5 and the standard deviation is 2.83 and we assume the distribution is approximately Normal, I would tell you that I am reasonably certain that if we make (a great) many observations, only 5% will be smaller than 0.4 = 5 - 2*2.3 or bigger than 9.6 = 5 + 2*2.3.

Notice what is the impact of standard deviation on our confidence interval? (the more spread, the more uncertainty)

Furthermore, in the general case where the data is not even approximately normal, but still symmetrical, you know that there exist some $\alpha$ for which:

Region: Sample mean +/- $\alpha$ standard deviation, contains roughly 95% of the data

You can either learn the $\alpha$ from a sub-sample, or assume $\alpha=2$ and this gives you often a good rule of thumb for calculating in your head what future observations to expect, or which of the new observations can be considered as outliers. (keep the caveat in mind though!)

I don't see how you are supposed to interpret it. Does 2.83 mean the values are spread very wide or are they all tightly clustered around the mean...

I guess every question asking "wide or tight", should also contain: "in relation to what?". One suggestion might be to use a well-known distribution as reference. Depending on the context it might be useful to think about: "Is it much wider, or tighter than a Normal/Poisson?".

EDIT: Based on a useful hint in the comments, one more aspect about standard deviation as a distance measure.

Yet another intuition of the usefulness of the standard deviation $s_N$ is that it is a distance measure between the sample data $x_1,… , x_N$ and its mean $\bar{x}$:

$s_N = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \overline{x})^2}$

As a comparison, the mean squared error (MSE), one of the most popular error measures in statistics, is defined as:

$\operatorname{MSE}=\frac{1}{n}\sum_{i=1}^n(\hat{Y_i} - Y_i)^2$

The questions can be raised why the above distance function? Why squared distances, and not absolute distances for example? And why are we taking the square root?

Having quadratic distance, or error, functions has the advantage that we can both differentiate and easily minimise them. As far as the square root is concerned, it adds towards interpretability as it converts the error back to the scale of our observed data.

Best Answer

Related Solutions

Solved – Why square the difference instead of taking the absolute value in standard deviation

Solved – Intuition behind standard deviation

Related Question