Solved – Why do we take the square root of variance to create standard deviation

standard deviationvariance

Sorry if this is has been answered elsewhere, I haven't been able to find it.

I am wondering why we take the square root, in particular, of variance to create the standard deviation? What is it about taking the square root that produces a useful value?

Best Answer

In some sense this is a trivial question, but in another, it is actually quite deep!

  • As others have mentioned, taking the square root implies $\operatorname{Stdev}(X)$ has the same units as $X$.

  • Taking the square root gives you absolute homogeneity aka absolute scalability. For any scalar $\alpha$ and random variable $X$, we have: $$ \operatorname{Stdev}[\alpha X] = |\alpha| \operatorname{Stdev}[X]$$ Absolute homogeneity is a required property of a norm. The standard deviation can be interpreted as a norm (on the vector space of mean zero random variables) in a similar way that $\sqrt{x^2 + y^2+z^2}$ is the standard Euclidian norm in a three-dimensional space. The standard deviation is a measure of distance between a random variable and its mean.

Standard deviation and the $L_2$ norm

Finite dimension case:

In an $n$ dimensional vector space, the standard Euclidian norm aka the $L_2$ norm is defined as:

$$\|\mathbf{x}\|_2 = \sqrt{\sum_i x_i^2}$$

More broadly, the $p$-norm $\|\mathbf{x}\|_p = \left(\sum_i |x_i|^p \right)^{\frac{1}{p}}$ takes the $p$th root to get absolute homogeneity: $\|\alpha \mathbf{x}\|_p = \left( \sum_i |\alpha x_i|^p \right)^\frac{1}{p} = | \alpha | \left( \sum_i |x_i|^p \right)^\frac{1}{p} = |\alpha | \|\mathbf{x}\|_p $.

If you have weights $q_i$ then the weighted sum $\sqrt{\sum_i x_i^2 q_i}$ is also a valid norm. Furthermore, it's the standard deviation if $q_i$ represent probabilities and $\operatorname{E}[\mathbf{x}] \equiv \sum_i x_i q_i = 0$

Infinite dimension case:

In an infinite dimensional Hilbert Space we similarly may define the $L_2$ norm:

$$ \|X\|_2 = \sqrt{\int_\omega X(\omega)^2 dP(\omega) }$$

If $X$ is a mean zero random variable and $P$ is the probability measure, what's the standard deviation? It's the same: $\sqrt{\int_\omega X(\omega)^2 dP(\omega) }$.

Summary:

Taking the square root makes means the standard deviation satisfies absolute homogeneity, a required property of a norm.

On a space of random variables, $\langle X, Y \rangle = \operatorname{E}[XY]$ is an inner product and $\|X\|_2 = \sqrt{\operatorname{E}[X^2]}$ the norm induced by that inner product. Thus the standard deviation is the norm of a demeaned random variable: $$\operatorname{Stdev}[X] = \|X - \operatorname{E}[X]\|_2$$ It's a measure of distance from mean $\operatorname{E}[X]$ to $X$.

(Technical point: while $\sqrt{\operatorname{E}[X^2]}$ is a norm, the standard deviation $\sqrt{\operatorname{E}[(X - \operatorname{E}[X])^2]}$ isn't a norm over random variables in general because a requirement for a normed vector space is $\|x\| = \mathbf{0}$ if and only if $x = \mathbf{0}$. A standard deviation of 0 doesn't imply the random variable is the zero element.)