Solved – Reliability of mean of standard deviations

reliabilitystandard deviation

I've a question which probably is going to show my ignorance about statistics :). I have a large set of machines that produce iron bars of certain lengths. For each machine, I have ran experiments and have a list of lengths. From those I can calculate a mean and sample standard deviation. I don't really care about their means and I am mainly focused on the variation. Therefore, I basically only record a sample standard deviation per machine. I think the results of each machine follow a normal distribution. So far so good 🙂

I now want to combine these variations into a single number. Therefore, I calculate the quadratic average of each machine variation, let's call it X. In the next step, I also would like to give an estimate for the spread around X. What is this number called and what's the best way to calculate it? I'm not sure it's related to the confidence interval of a standard deviation and I don't know whether the measurements are independent (a design fault would show up in all, a construction maybe only in some).

Example. I'll try to clarify with an example. Suppose I measure 3 machines and find that they produce lengths of
M1: 100 +/- 7
M2: 120 +/- 8
M3: 130 +/- 9
where the numbers behind the +/-'s are the sample standard deviations of observed values on that single machine. As said before, I don't care about the means but only in the spread, so I define {X_1, X_2, X_3} = {7,8,9}. Their quadratic average is X = RMS(X_i) = $\sqrt{194}$ and I think of X as an indication of the average spread of a machine in my park.

Suppose that I would have found {X_1, X_2, X_3} = {3,8,11}. Their quadratic average is the same $\sqrt{194}$, but the spread around it is obviously bigger. My confidence in the correctness of $\sqrt{194}$ as the average spread of a machine should therefore be lower (I'd like to test some more machines, for instance) and I would like to express this in a number.

Questions
Some answer to questions: they aren't identical; if some machine really misbehaves I could see it directly from the machine test (i.e. I would see it from a large X_i), but I wouldn't detect a small misbehavior. Furthermore, the sample amount for each machine could be different (I have more tests on my old machine wrt my new machines).

Best Answer

If you want to test whether the variances of several machines deviates from the other variances combining them into average will not help you. The problem is that these differing variances will skew your average. To test whether there are different variances you can use Bartlet's test. It is sensitive to normality, but since you said that your data is normal this should not be a problem, though it would be a good idea to test that.

Now if you can assume that all the machines are similar in sense that they can have different means but similar variance, the problem is very simple. If you assume that machines are independent treat the variances from each machine as a random sample. Then estimate the mean and standard deviation of this sample. For large number of machines, the normal approximation will kick in, so it will not matter whether you use standard deviations or the variances. In both cases the sample mean will estimate average statistic of your choice, and standard deviation of the sample will estimate average spread of statistic of your choice. The 95% confidence interval will then be $\mu\pm 1.96\sigma$.

Related Question