How do you find the sample standard deviation when given the population standard deviation? What formula do you use? If you can make up an example that would be great.
[Math] sample standard deviation given population standard deviation
standard deviationstatistics
Related Solutions
There are, in fact, two different formulas for standard deviation here: The population standard deviation $\sigma$ and the sample standard deviation $s$.
If $x_1, x_2, \ldots, x_N$ denote all $N$ values from a population, then the (population) standard deviation is $$\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2},$$ where $\mu$ is the mean of the population.
If $x_1, x_2, \ldots, x_N$ denote $N$ values from a sample, however, then the (sample) standard deviation is $$s = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i - \bar{x})^2},$$ where $\bar{x}$ is the mean of the sample.
The reason for the change in formula with the sample is this: When you're calculating $s$ you are normally using $s^2$ (the sample variance) to estimate $\sigma^2$ (the population variance). The problem, though, is that if you don't know $\sigma$ you generally don't know the population mean $\mu$, either, and so you have to use $\bar{x}$ in the place in the formula where you normally would use $\mu$. Doing so introduces a slight bias into the calculation: Since $\bar{x}$ is calculated from the sample, the values of $x_i$ are on average closer to $\bar{x}$ than they would be to $\mu$, and so the sum of squares $\sum_{i=1}^N (x_i - \bar{x})^2$ turns out to be smaller on average than $\sum_{i=1}^N (x_i - \mu)^2$. It just so happens that that bias can be corrected by dividing by $N-1$ instead of $N$. (Proving this is a standard exercise in an advanced undergraduate or beginning graduate course in statistical theory.) The technical term here is that $s^2$ (because of the division by $N-1$) is an unbiased estimator of $\sigma^2$.
Another way to think about it is that with a sample you have $N$ independent pieces of information. However, since $\bar{x}$ is the average of those $N$ pieces, if you know $x_1 - \bar{x}, x_2 - \bar{x}, \ldots, x_{N-1} - \bar{x}$, you can figure out what $x_N - \bar{x}$ is. So when you're squaring and adding up the residuals $x_i - \bar{x}$, there are really only $N-1$ independent pieces of information there. So in that sense perhaps dividing by $N-1$ rather than $N$ makes sense. The technical term here is that there are $N-1$ degrees of freedom in the residuals $x_i - \bar{x}$.
For more information, see Wikipedia's article on the sample standard deviation.
At the root, the issue here seems to be whether to use the z-statistic or the t-statistic in finding a confidence interval for the population mean $\mu$ or in testing a hypothesis about $\mu.$
Suppose $X_1, X_2, \dots, X_n$ is a random sample from a normal population of which both the mean $\mu$ and the standard deviation $\sigma$ are unknown. We wish to find a 95% confidence interval (CI) for $\mu.$
If we knew $\sigma$ then $$Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}} \sim Norm(0, 1).$$ Thus $$P\left\{-1/96 \le \frac{\bar X - \mu}{\sigma/\sqrt{n}} \le 1.96\right\} = 0.95,$$ in which $\mu$ can be isolated in a few steps of algebra to $$P\{\bar X - 1.96\sigma/\sqrt{n} \le \mu \le \bar X + 1.96\sigma/\sqrt{n}\} = 0.95.$$ And so we say that a 95% CI for $\mu$ is $\bar X \pm 1.96\sigma/\sqrt{n},$ in which all quantities $\bar X, \sigma,$ and $n$ are known. The numbers $\pm 1.96$ are chosen because they cut 2.5% probability from the upper and lower tails of the standard normal distribution, leaving 95% in the center.
In case $\sigma$ is unknown, it is convenient to use the sample standard deviation $S$ instead, claiming that $\bar X \pm 1.96 S/\sqrt{n}$ or perhaps $\bar X \pm 2 S/\sqrt{n},$ is an approximate 95% CI for $\mu.$ If $n \ge 30,$ this approximation is pretty good, for reasons we see just below.
If $\sigma$ is not known, the exact distribution is
$$T = \frac{\bar X - \mu}{S/\sqrt{n}} \sim T(n-1),$$
Student's t distribution with $n-1$ degrees of freedom.
Then an exact 95% CI for $\mu$ is $\bar X \pm t^* S/\sqrt{n},$
where $t^*$ cuts 2.5% of probability from the upper tail of $T(n-1)$ and, by symmetry, $-t^*$ cuts 2.5% from the lower tail. Looking at tables of the t distribution we see that for $n \ge 30$ (or $n-1\le 29$), $t^*$ is approximately 2.0. So the approximate procedure with the standard normal distribution and the exact procedure with Student's t distribution amount to about the same thing.
For smaller values of $n$, the values of $t^*$ get noticeably larger. For example if $n = 10$, we have $t^* = 2.262.$ Thus the 95% CI gets longer (less precise). You can think of this loss of precision as a 'penalty' for having to estimate $\sigma$ by $S$ instead of knowing the exact value of $\sigma.$
There are a few good reasons to forget the 'rule of 30' altogether:
First, it 'works' only for 95% CIs. For a 99% CI we need to cut 0.5% of probability from each tail: the normal cut-off value is $z^* = 2.576$ and we need to increase the sample size to about $n = 60$ before $t^* \approx 2.6.$
Second, in using statistical software, either we know the exact value of $\sigma$ or the program will approximate it from data as $S.$ From the start, we have to know whether we are doing a z-interval or a t-interval. Using an unnecessary rule about sample size only confuses issue. The correct rule is: use z-procedures is $\sigma$ is known (and it usually isn't in practice); use t-procedures of not.
Third, some authors of elementary books try to use the 'rule of 30' (without any theoretical justification) for various kinds of limiting procedures, applicability of the Central Limit Theorem, safe use of t-procedures for non-normal data, and so on. In these applications, 30 is seldom an appropriate dividing line.
Best Answer
My answer based on the second version of the original post:
According to the central limit theorem, the standard deviation of the sample mean of $n$ data from a population is $\sigma_{\overline{X}}=\sigma_X/\sqrt{n}$, where $\sigma_X$ is the population standard deviation. In your case, $\sigma_{\overline{X}}=40/\sqrt{100}=4$.
My answer based on the first and third versions of the original post:
In order to "get the sample standard deviation," you need to specify a sample (a subset of the population). If you do not specify a sample, then you cannot get the sample standard deviation. If you do specify the sample, then you can get the sample standard deviation. In either case, knowledge of the population standard deviation is irrelevant.
For example, consider a population $\{0,1,2,3,\ldots,k\}$ where $k$ is, say, some integer greater than 3. Even if I told you what the population standard deviation was, there is still no way to find the sample standard deviation because no sample was specified.
Now consider a sample $\{0,1,2\}$. The sample mean is $(0+1+2)/3=1$, the sample variance is $s_X^2=\frac{1}{n-1}\sum_{i=1}^n(x_i-\overline{x})^2=\frac{1}{3-1}[(0-1)^2+(1-1)^2+(2-1)^2]=1$, and the sample standard deviation is $s_X=1$, regardless of what the population standard deviation is.