Solved – How to calculate the probability for pattern that violates “control chart rules” to occur

control chartpoisson distributionr

I am going to host a training session to teach healthcare staff how to use control chart (c-chart to be specific), and I need to tell that why these rules (Western Electric Rules) are so called rules that when pattern is matched, outbreak can be considered as a probability.

I have read the article "Control Charts 101: A Guide to Health Care Applications" a bit and it teaches me a bit, saying the chance for one data point to exceed UCL is less than 0.5%, I can calculate that using the following R code without problem:

1-pnorm(3, mean=0, sd=1)
[1] 0.001349898

I have written the following R code to verify the same thing in c-chart (i.e. poisson rather than normal distribution), for lambda ranged from 1.0 to 100.0

UCL <- integer(0)

for (loop.UCL in (1:1000))
    {lambda <- loop.UCL/10
     result <- 1-ppois(lambda+ceiling(sqrt(lambda)*3),lambda)
     UCL <- c(UCL, result)
    }

summary(UCL)
     Min.   1st Qu.    Median      Mean   3rd Qu. 
0.0009679 0.0016160 0.0018260 0.0019960 0.0021220 
     Max. 
0.0134600 

I think I can handle the first Western Electric Rules, but I am puzzled for the second and third.

Can anyone give me some insight or hint? Thanks!

Best Answer

Two figures of merit in control charting are (1) the expected length of time the process will appear to remain in control when in fact it is; and (2) the expected length of time it takes for an OOC condition to be detected after the process first moves out of control.

Under the usual assumptions--iid normally distributed values, no serial correlation, etc--we can reduce the first case to analyses of correlated coin flipping experiments. An accurate solution takes some work; people usually run simulations. However, each rule by itself has a simple interpretation:

Rule 1 characterizes each measurement by whether it lies beyond the interval $[-3\sigma, 3\sigma]$ (with 0.27% probability) or not. It corresponds, then, to flipping a coin with $\Pr(\text{heads})$ =0.0027 and we want to know the expected number of flips before a "heads" (OOC condition) is observed.

Rule 2 characterizes each measurement by whether it exceeds $2\sigma$ or falls below $-2\sigma$. This is like a "3-sided" coin: a multinomial distribution. One face says "above $2 \sigma$" and occurs with probability 2.28%. Let's call this "heads 1". Another face says "below $-2 \sigma$" and also occurs with probability 2.28%. Call this "heads 2". The third face says "between $-2 \sigma$ and $2 \sigma$" and occurs with probability 95.45%. The analogous question concerns the expected number of flips with this coin before a sequence of two heads of the same type is observed. The calculation might not be easy, but it's easy to see this event is fairly rare: the chance of either head appearing is just 4.55%, but given that it just appeared, the chance that a head of the same type immediately follows it is only 2.28%. Thus, if we only had a pair of throws to consider, an OOC event of this type would occur with probability 4.55% * 2.28% * 2 (multiply by 2 to account for both types of "heads") = 0.21%.

Rule 3 can be analyzed in a similar manner (but is more complicated).

Note that the rules are interrelated: a single observation can violate two or even all three rules, even though all preceding observations were in control. However, this has fairly low probability of occurring, so to a good approximation we can assume the rules are mutually exclusive (allowing us to sum their probabilities).

The purpose of rules 2 and 3 is to reduce the expected time needed to detect an OOC condition caused by a systematic change in the mean. How they accomplish this is intuitively clear: a small increase in mean, for example, only slightly increases the chance of triggering rule 1, but greatly increases the chance of triggering rule 2. For example, a one-sd increase in mean increases the chance of a rule 1 violation to $1 - \Phi(3-1) + \Phi(-3-1)$ = 2.28%, which is expected to take about 1/0.028 = 44 time steps to detect, but a rule 3 violation (four in a row above 1 sd) now has slightly greater than a $(1/2)^4$ = 6.25% chance of occurring, which will be detected almost three times quicker (around 16 time steps).

In summary, these rules can be understood by analyzing sequences of coin flips (or die rolls); each one corresponds to an event (or sequence of events) that is sufficiently rare that a process in control will go for a long time without triggering an OOC signal; and the combined set of rules is formulated to be able to detect relatively small shifts of the mean as quickly as possible.

Related Question