Solved – How to calculate the range that 90% of the values fall into

descriptive statistics

Given a series of values, I know that 68% of the values fall withing one standard deviation and that 95% fall withing 2 standard deviations, but how do I calculate the range whereby 90% percent of the values will fall. Lets call that 90% range the typical range.

Best Answer

Given a series of values, I know that 68% of the values fall within one standard deviation and that 95% fall within 2 standard deviations,

This won't be true as a general statement, sometimes not even approximately.

-- aside

To clarify -

@whuber points out that they're often good approximations - which is true. (The 2 s.d. case in particular seems to be pretty robust; if you have unimodal continuous data that's not too asymmetric, the 2 s.d. rule can't be out by more than about 7%.) But stated as a general claim, as I attempted to point out, it's not the case. It can be that they're not even roughly in the ballpark. I have encountered real data (very often!) that gets very close to 100% inside 1 s.d. of the mean - indeed, there's some in my R session right now that I was playing with just a short time ago (some insurance data). I have also encountered (a few times) data that had very close to 0% within 1 s.d. of the mean. For example, I've seen Likert scale data that was very close to evenly split at the extremes, with only a very small percentage in the inner categories between them. It really happens.

Consider two samples of 100 observations. One has 98 0's and a -1 and a 1. The other has two zeros and 49 -1's and 49 1's. The first has 98% of the data within one s.d. of the mean, while the second has 2% within one s.d. of the mean. (The real data I've seen isn't quite so 'neat', but the percentages aren't so different.)

If you use R, these vectors contain those samples:

x1<-c(-1,1,rep(0,98))
x2<-c(0,0,rep(-1,49),rep(1,49))

-- end aside

You appear to have left some conditions out.

how do I calculate the range whereby 90% percent of the values will fall. Lets call that 90% range the typical range

If you mean 'for normally distributed data' (which you had better specify), and assuming a symmetric interval then you just find the 5th and 95th percentiles of the normal distribution. Those are about 1.645 sd's either side of the mean.

You can look them up in normal tables, or you can use a program to find them for you.

--

Edit: It's also possible to generate nonparametric intervals based on sample quantiles, such as tolerance intervals or predictions intervals (depending on the precise form of the probability statement you wish to make). These are less efficient than when your data really has whatever parametric form you assume, but they don't rely on any parametric assumption.