Solved – Closed formula for D4 constant calculation? (Moving range chart constant)

biasbias correctionquality controlunbiased-estimator

I need to build a Moving Range Shewhart control chart given a series of observations. In short, I have to calculate the central line and the upper and lower limits as follows

cl = MR
lcl = MR - D3 * MR
ucl = MR + D4 * MR

Where MR is the average moving range and D3, D4 are unbiasing constants. Both D3 and D4 depend on the size of the moving range (n).

My textbook, as most of the books on this subject do, suggests to calculate the constants just by looking at the tables however this is really unpractical when using a software such as R or Python and you need to test multiple moving range sizes, therefore I would like to define a function to calculate D3 and D4 given the moving range size. In order to do this, I need to know the closed formula for D3 and D4 which is not explained anywhere, apparently, or I'm missing something (most likely).

Is there a formula for calculating the D3 and D4 constant for unbiasing the Moving Range chart?

Some more explanation on why these constant are needed would be helpful, the book doesn't say much on the subject.

Best Answer

Repeat the question:
So you want a form for the D4 coefficient for a Shewhart moving range control chart?

Answer:
Here is a table (link), and you can make a really fast lookup for windows within the range. It is piece-wise constant interpolation, unless you have non-integer sample sizes.

Here is a picture in case the link breaks. enter image description here

I get an okay fit for $ 3 \le n \le 25$ using a linear model.

x <- seq(from=2,to=25,by=1)
y <- c(3.267,       2.574,       2.282,       2.114,
       2.004,       1.924,       1.864,       1.816,
       1.777,       1.744,       1.717,       1.693,
       1.672,       1.653,       1.637,       1.622,
       1.608,       1.597,       1.585,       1.575,
       1.566,       1.557,       1.548,       1.541)

est <- lm (I(log(y ))~ 1 + I(log(x)) +I(log(x)^2)+I(log(x)^(0.345)) )
summary(est)

Where the result is:

$log(D4) = 3.0326040 + 0.2940527*log(n)-0.0063287*(log(n)^2) - 2.3257758*(log(n)^{0.345}) $

The summary values for the fit were:

Call:
lm(formula = I(log(y)) ~ 1 + I(log(x)) + I(log(x)^2) + I(log(x)^(0.345)))

Residuals:
       Min         1Q     Median         3Q        Max 
-4.024e-04 -1.388e-04  1.084e-05  1.242e-04  3.469e-04 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        3.0326040  0.0058494  518.45  < 2e-16 ***
I(log(x))          0.2940527  0.0037063   79.34  < 2e-16 ***
I(log(x)^2)       -0.0063287  0.0003954  -16.00 7.24e-13 ***
I(log(x)^(0.345)) -2.3257758  0.0091925 -253.01  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0002038 on 20 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:      1 
F-statistic: 6.188e+06 on 3 and 20 DF,  p-value: < 2.2e-16

It is an un-adjusted $R^2$ of darn near 6-9's. I had to use vegan to get to that. Every parameter has a p-value smaller than 1e-13, on 24 samples. When extrapolation occurs, the function appears for a while to retain "physics". If you look at the first differences of the raw data, there is a very linear trend at progressively higher values of n. It also does not make physical sense that there should be a negative, or even a very small value for D4.

Here is the code to get the R^2

library(vegan)
RsquareAdj(est)

Here is the result. Yes, that is 5-9's.

$r.squared
[1] 0.9999989

$adj.r.squared
[1] 0.9999988

This was how I came to the 0.345 power. It is the midpoint between which the adjusted $R^2$ is constantly 0.9999988. One unit either side of the edge, the value drops to 0.99999987.

Here is the code to get the plot (and yes I am using "y" instead of "d4"):

n <- seq(from=3, to=50,by=1)
y2 <- exp(3.0326040 + 0.2940527*log(n)-0.0063287*(log(n)^2) - 2.3257758*(log(n)^0.345))

plot(n,y2,type="b",col="Green",xlab="n",ylab="D4")
points(x,y,pch=2,col="Red")
legend(x = 35,y = 2.5,legend = c("fit","data"),
       col=c("Green","Red"),
       pch=c(1,2),
       lty=c(1,-2))

Here is the result

enter image description here

I don't like extrapolating, especially without a decent understanding of what is actually going on, but if I was forced to do something then this might be where I would start.

Related Question