Solved – Calculate standard deviation given mean and percentage

normal distributionstandard deviation

Some values have a normal distribution with mean .0276. What standard deviation is required so that 98% of values are between .0275 and .0278?

What I'm confused with is how to calculate the standard deviation when Z is between two intervals. I know that P(-.0001/σ < Z < .0002/σ) = .98, but I don't know where to go from here.

Best Answer

We can solve this problem almost instantly in our heads using the "68-95-99.7" rule. I will explain the process in detail because that is what matters. The answer is of little interest: the point to this question is to help us learn to think about probability distributions.

These numbers in the 68-95-99.7 rule are (approximately) the percent chances that a Normal variable lies within one, two, and three standard deviations of its mean. By subtracting these numbers from 100% it follows that the chances of a Normal variable lying beyond one, two, and three SDs of its mean are about 32, 5, and 0.3 percent, respectively. Since this distribution is symmetric, we can split each of these numbers in half to find the chances of lying beyond one, two, and three SDs of the mean in a given direction: the values are about 16, 2.5, and 0.15 percent, respectively. (Slightly more accurate values are shown in the figure.)

The figure uses areas to represent chances. The leftmost value of 16%, for instance, is the proportion of all the area under the curve that lies to the left of -1. The "tail areas" associated with the numbers $Z = -3,-2,-1, 1,2,3$ are labeled. (These areas overlap; for instance, the 16% values include regions accounted for by the 2.3% and 0.13% values.)

People who think effectively about probabilities use mental figures like this one.

Turn to the data in the question: 0.0275 is 0.0001 to the left of the mean of 0.0276 while 0.0278 is 0.0002 to the right of the mean: twice as far. We therefore need to enclose 98% of the probability between an unknown number of standard deviations to the left of the mean--call this multiple $-Z$ to indicate it's to the left--and twice that number of standard deviations to the right of the mean, which therefore is $2Z.$

Equivalently, 100 - 98 = 2% of the probability must lie beyond this range. The figure shows 2.3% of the probability lies to the left of $-Z=-2$ and essentially 0% lies to the right of $Z=2\times 2=4,$ so $Z=2$ would be an accurate guess (albeit a tad low).

The only arithmetic needed to get to this point involved subtractions, one division (of 0.0002 / 0.0001) and halving.

If you would like to get a little closer to "the" answer, look up (or compute) the value of $Z$ for which 2% of the probability is to the left of $-Z$: that's $Z=2.054.$ It's still the case that essentially 0% is to the right of $2Z \approx 4.1.$ (Because there actually is a tiny bit of probability beyond $4.1,$ the correct value of $Z$ must be just a tiny bit more than $2.054.$)

Either way, we come up with the result that $Z$ is somewhere around $2$ or $2.054.$

Finally, return to the data in the problem: $Z$ standard deviations equals $0.0001$ (or $2Z$ standard deviations equals $0.0002:$ it's all the same). Our answers therefore are

Quick and dirty, based on the 68-95-99.7 rule: $0.0001/2 = 0.00005.$
A little more refined, based on a table lookup: $0.0001/2.054 \approx 0.0000486\,91.$

We know both of these answers will be a little too large, but the second must be quite accurate.

Having gone through this thought process, we could write down the following R commands immediately because they directly carry out the calculation (albeit more accurately):

(Z <- uniroot(function(z) pnorm(2*z)-pnorm(-z) - 0.98, c(2,3))$root)

2.054 158

That agrees with the three decimal digit table I used to get $2.054.$

(0.0276 - 0.0275) / Z

4.86 8176e-05

It agrees with our first answer almost to two significant figures and with the second answer almost to four significant figures--more than we really deserve.

Related Solutions

Mean Absolute Deviation vs Standard Deviation – Key Differences Explained

Both answer how far your values are spread around the mean of the observations.

An observation that is 1 under the mean is equally "far" from the mean as a value that is 1 above the mean. Hence you should neglect the sign of the deviation. This can be done in two ways:

Calculate the absolute value of the deviations and sum these.
Square the deviations and sum these squares. Due to the square, you give more weight to high deviations, and hence the sum of these squares will be different from the sum of the means.

After calculating the "sum of absolute deviations" or the "square root of the sum of squared deviations", you average them to get the "mean deviation" and the "standard deviation" respectively.

The mean deviation is rarely used.

Solved – Normal Distribution with random mean and standard deviation

Your solution is correct, assuming the two normal random variables are independent. According to the R documentation of rnorm, you can input a vector of means and standard deviations for the mean and sd arguments respectively.

To verify, consider this toy example:

n <- 3 
mean_vector <- c(0,10,100)
sd_vector <- c(1,1,1)

rnorm(3, mean=mean_vector, sd=sd_vector)

Some output:

[1]  1.049676 11.566033 98.481899
[1] -1.374753  9.078215 99.465803
[1]  3.056377  9.837055 98.842553

Clearly the first variate for each simulation is $N(0,1)$ distributed, the second is $N(10,1)$ distributed, and the third is $N(100,1)$ distributed.

Best Answer

Related Solutions

Mean Absolute Deviation vs Standard Deviation – Key Differences Explained

Solved – Normal Distribution with random mean and standard deviation

Related Question