Solved – Question about standard deviation and central limit theorem

central limit theoremmeanstandard deviation

I have a quick question about the central limit theorem. Lets say I measure some value that comes from an arbitrary distribution N times and I repeat this M times. I understand that if I calculcate the mean from the N values I will have a set of M values that follows a normal distribution. But what if I measure the sample standard deviation from N, will my resulting distribution also be normal? Following the derivation of the CLT I do not see this to be the case, but intuitively I think this is true, at least for some distributions. Any light on the issue would be greatly appreciated.

First I will quote the CLT from wiki:

the central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, be approximately normally distributed.

My question then is a variant on the quote from the wiki page: will the the central limit theorem (CLT) state that, given certain conditions, the STANDARD DEVIATION of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, be approximately normally distributed? Here is a series of plots I made from random numbers that follow a beta distribution. I have generated 1000 sets of 5000 points each. The first plot is a histogram of the first set. The second is a histogram of the 1000 calculated means, and the 3rd is a histogram of the 1000 calculated std.

Simulation results

Results 2

Results 3

Best Answer

Yes, the sample standard deviation is asymptotically normal. Let the sample standard deviation be $\hat{\sigma} = \sqrt{\frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2}$, and let $\sigma$ be the population standard deviation. Let's use the central limit theorem to show that $$ \sqrt{n}(\hat{\sigma} - \sigma) \xrightarrow{d} N(0, V). $$ First write things as $$ \sqrt{n}(\hat{\sigma} - \sigma) = \sqrt{n}\left(\sqrt{ \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2} - \sqrt{\sigma^2} \right)$$ The central limit theorem tells is about how sample moments minus population moments behave. If we didn't have square roots above, we'd just have something like sample moments minus population ones, and we could use the central limit theorem. To get rid of the square roots, let's take a Taylor expansion of the first square root around $\sigma^2$. $$ \sqrt{ \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2} = \sqrt{\sigma^2} + \frac{1}{2\sigma} \left( \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2 - \sigma^2\right) + O\left(\left( \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2 - \sigma^2\right)^2\right) $$ Plugging this into the above, we have $$\sqrt{n}(\hat{\sigma} - \sigma) = \frac{\sqrt{n}}{2 \sigma} \left( \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2 - \sigma^2\right) + O\left(\frac{1}{n^{3/2}} \left(\sum_{i=1}^n (x_i - \bar{x})^2 - \sigma^2\right)^2\right)$$ Rearranging the first term on the right gives $$\frac{\sqrt{n}}{2 \sigma} \left( \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2 - \sigma^2\right) = \frac{1}{2\sigma \sqrt{n}} \sum_{i=1}^n ((x_i - E[X])^2 - \sigma^2) - \frac{\sqrt{n}}{2\sigma}(\bar{x} - E[x])^2 $$ The central limit theorem tells us that under some conditions $\frac{1}{\sqrt{n}} \sum_{i=1}^n (y_i - E[y]) \xrightarrow{d} N(0, \sigma_y^2)$. With $y_i = (x_i - E[x])^2$ and $E[y] = \sigma^2$, then the first term on the right converges in distribution to a normal.

The second term we can write as $\sqrt{n}(\bar{x} - E[x]) (\bar{x} - E[x])$, and since $\sqrt{n}(\bar{x} - E[x]) \xrightarrow{d} N$ and $(\bar{x} - E[x]) \xrightarrow{p} 0$, by Slutsky's lemma, the product converges in probability to 0.

Similarly, we could show that, $\frac{1}{n^{3/2}} \left(\sum_{i=1}^n (x_i - \bar{x})^2 - \sigma^2\right)^2 \xrightarrow{p} 0$, so the remainder from the Taylor expansion vanishes.

This Taylor expansion trick comes up often, so it has a name. It's called the delta method.

Related Question