Solved – Modify standard deviation with fixed mean

meanoutliersstandard deviationvariance

I have a set of pipes that are supposed to receive balanced load from a source. For instance, with 100 of inputs if I have 5 pipes each is supposed to handle 20. But that's not the case in reality and normally it's like:

[23, 26, 19, 18, 14]

I'm working on a system to trigger an alarm if one of the pipes are underloaded or overloaded. So clearly I'm looking for outliers. I found that standard deviation would be good fit for detecting that but I wonder that the mean of random numbers might skew the expected value (here 20).

So the question is, if instead of the mean I use 20 and calculate the SD based on that, how correct the final result will be? Or am I completely confused?

Few cases that might trigger the alarm:

[20, 20, 20, 3, 37]
[20, 0, 20, 20, 40]

Best Answer

To answer the question itself:

So the question is, if instead of the mean I use 20 and calculate the SD based on that, how correct the final result will be?

Since the total flow is always 100, the sample mean will always be 20. As a result, if you do the same standard deviation calculation it will make no difference whether you use the known mean or the sample mean.

What might change is the denominator (if you cared about an unbiased-for-variance estimate for some reason), but if you're just comparing with some yet-to-be-determined cut-off there's no particular reason to bother about that. Even if you did divide by $n=5$ rather than $n-1=4$ to make the variance unbiased, all you'd need to do is adjust the cut-off point the same way and performance would again be identical either way. The numerator of the variance is the thing that is putting a partial order on sets of pipe-loads, the rest is really just scaling. Indeed since the mean is fixed one could just as well simply use the sum of squares of the loads as a criterion (that statistic has been used in similar contexts where uniformity is an issue).

I think this particular issue is about as not-really-an-issue as they come. If the standard deviation really does what you want, just think of it as whichever of the two calculations is easiest for you to think about, since it makes no difference. Once you're sure sd does what you want, the next trick is to figure out a suitable cutoff value for your alarm.