Visualize-One-Sample-Test – How to Best Visualize One-Sample Test Results

data visualizationrwilcoxon-signed-rank

We are currently writing a paper with several one-sample Wilcoxon tests. While visualizing two-sample tests is easy via boxplots, I was wondering whether there is any good way to visualize one-sample test results?

# Example data
pd <- c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64,
        0.73, 1.46, 1.15, 0.88, 0.90, 0.74, 1.21)

wilcox.test(pd, mu = 1.1)

#   Wilcoxon signed rank test
#
# data:  pd
# V = 72, p-value = 0.5245
# alternative hypothesis: true location is not equal to 1.1

…and also:

I would like to get Z-value instead of V-value. I know that if I use coin package instead of basic stats I will have z-values, but coin package
seems not to be able perform one-sample Wilcoxon test.

Best Answer

Something like this?

One sample boxplot

Or were you after some interval for the median, like you get with notched boxplots (but suited to a one sample comparison, naturally)?

Here's an example of that:

enter image description here

This uses the interval suggested in McGill et al (the one in the references of ?boxplot.stats). One could actually use notches, but that might increase the chance that it is interpreted instead as an ordinary notched boxplot.

Of course if you need something to more directly replicate the signed rank test, various things can be constructed that do that, which could even include the interval for the pseudo-median (i.e. the one-sample Hodges-Lehmann location estimate, the median of pairwise averages).

Indeed, wilcox.test can generate the necessary information for us, so this is straightforward:

> wilcox.test(pd,mu=1.1,conf.int=TRUE)

    Wilcoxon signed rank test

data:  pd
V = 72, p-value = 0.5245
alternative hypothesis: true location is not equal to 1.1
95 percent confidence interval:
 0.94 1.42
sample estimates:
(pseudo)median 
        1.1775 

and this can be plotted also:

boxp with signed rank interval for pseudomedian

[The reason the boxplot interval is wider is that the standard error of a median at the normal (which is the assumption underlying the calculation based off the IQR) tends to be larger than that for a pseudomedian when the data are reasonably normalish.]

And of course, one might want to add the actual data to the plot:

same plot with jittered strip chart under the interval


Z-value

R uses the sum of the positive ranks as its test statistic (this is not the same statistic as discussed on the Wikipedia page on the test).

Hollander and Wolfe give the mean of the statistic as $n(n+1)/4$ and the variance as $n(n+1)(2n+1)/24$.

So for your data, this is a mean of 60 and a standard deviation of 17.61 and a z-value of 0.682 (ignoring continuity correction)


The code I used to generate the fourth plot (from which the earlier ones can also be done by omitting unneeded parts) is a bit rough (it's mostly specific to the question, rather than being a general plotting function), but I figured someone might want it:

notch1len <- function(x) {
  stats <- stats::fivenum(x, na.rm = TRUE)
  iqr <- diff(stats[c(2, 4)])
  (1.96*1.253/1.35)*(iqr/sqrt(sum(!is.na(x))))
}

w <- notch1len(pd)
m <- median(pd)

boxplot(pd,horizontal=TRUE,boxwex=.4)

abline(v=1.1,col=8)
points(c(m-w,m+w),c(1,1),col=2,lwd=6,pch="|")

ci=wilcox.test(pd,mu=1.1,conf.int=TRUE)$conf.int                       #$
est=wilcox.test(pd,mu=1.1,conf.int=TRUE)$estimate

stripchart(pd,pch=16,add=TRUE,at=0.7,cex=.7,method="jitter",col=8)

points(c(ci,est),c(0.7,0.7,0.7),pch="|",col=4,cex=c(.9,.9,1.5))
lines(ci,c(0.7,0.7),col=4)

I may come back and post more functional code later.