How to Build Confidence Interval in Wilcoxon Test in R

confidence intervalnonparametricwilcoxon-signed-rank

I want to calculate the confidence interval around the median obtained from this data set:

dat <- c(2.10, 2.35, 2.35, 3.10, 3.10, 3.15, 3.90, 3.90,  4.00,  4.80, 5.00,  5.00,  5.15,  5.35,  5.50,  6.00,  6.00,  6.25,  6.45)

The descriptive statistics:

summary(dat)
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
2.100   3.125   4.800   4.392   5.425   6.450 

I cannot find how the confidence interval that is presented together with the wilcoxon test results is calculated:

wilcox.test(dat, conf.int = T, correct = T, exact = F, conf.level = .99)
    Wilcoxon signed rank test with continuity correction

data:  dat
V = 190, p-value = 0.0001419
alternative hypothesis: true location is not equal to 0
99 percent confidence interval:
 3.450018 5.499933
sample estimates:
(pseudo)median 
      4.400028 

I just want to estimate the median of the population with a confidence interval using a non-parametric method. How the confidence interval shown above is related to the Wilcoxon signed rank test?

Best Answer

I just want to estimate the median of the population with a confidence interval using a non-parametric method.

Note that the interval generated for the signed rank test is for the population version of the one-sample Hodges-Lehmann statistic (the pseudomedian), not the median.

Under the assumption of symmetry (which is necessary under the null for the signed rank test, but not necessarily required under the alternative, which is what you're calculating a confidence interval under), the two population quantities will coincide. You may be happy to make that somewhat stronger assumption, but keep in mind that it's quite possible for the sample median to fall outside the CI this generates.

How is the confidence interval shown above related to the Wilcoxon signed rank test?

It's the set of values for the pseudomedian that would not be rejected by a signed rank statistic. You can actually find the limits that way; this is a pretty general way to arrive at confidence intervals for statistics you don't have a simpler way to do it for.

There's a specific way to find the limits for the signed rank test that doesn't need you to do that, but you can use search methods to get there quite quickly with this general approach.

The more specific approach for the signed rank test is based on a symmetric pair of order statistics of the Walsh averages (averages of each $(i,j)$ pair $\frac{1}{2}(X_i+X_j)$, for $i \leq j$ ... i.e. including each point averaged with itself). The signed rank statistic is the number of positive $W$s.

Then if we label those averages $W_k, k=1, 2, ..., m$ where $m=n(n+1)/2$, the corresponding interval will be the symmetric pair of order $(W_{(k)},W_{(m+1-k)})$ with $k$ chosen as small as possible but still leads to endpoints in the non-rejection region of the test.

(This pdf outlines that in some detail.)

Related Question