Solved – Help me understand the z-value computed in wilcoxsign_test (R, coin package)

nonparametricpermutation-testrwilcoxon-signed-rank

I'm using the coin-package in R to run comparisons on paired data using wilcoxsign_test. I use the exact version of the test, so I assume that the test goes through all permutations of the paired data and returns an accurate p-value.

What is unclear to me is how the z-value returned by default by the function is computed. I'm aware of the usual large sample approximation formulas used to convert Wilcoxon statistics to z scores, but I wonder how the z value is derived in the case of the exact permutation version of the test.

My questions are:

Are all of the w-statistics produced by the permutation enumerated so that the distribution of w is approximately normally distributed and the observed w value is then converted to a z value based on distribution yielded by the permutation?
As I understand it, the p-value of the exact test is accurate and the z value is some sort of approximation that normally requires a sample size of > 10 (at least when using the standard formulas) and thus not always a very useful statistic. Does the permutation version of the test somehow get around this?
The p-values returned do not always correspond to the p-values I obtain by converting the z values to p-values. Could someone clear this up for me?

Best Answer

Are all of the w-statistics produced by the permutation enumerated so that the distribution of w is approximately normally distributed and the observed w value is then converted to a z value based on distribution yielded by the permutation?

I doubt it, since it's not necessary -- we can compute the mean and variance without knowing the exact distribution, even in the presence of ties. But you could only tell with absolute certainty what the code does by actually looking at it.

As I understand it, the p-value of the exact test is accurate and the z value is some sort of approximation that normally requires a sample size of > 10 (at least when using the standard formulas) and thus not always a very useful statistic.

No, the z value should be exact at any sample size (in the sense that it's a correct computation of the number of standard deviations the test statistic is from the mean under the null), but the resulting quantity doesn't have an exact normal distribution (under the null), so if you computed a p-value from that z, then the p-value would be approximate.

The p-values returned do not always correspond to the p-values I obtain by converting the z values to p-values.

Sure --

The test statistic isn't exactly normally distributed.
If they're using sampling of the permutation distribution rather than exact calculations, there's also some simulation error.

Related Solutions

Permutation Tests – Which Permutation Test Implementation in R to Use Instead of T-Tests (Paired and Non-Paired)?

It shouldn't matter that much since the test statistic will always be the difference in means (or something equivalent). Small differences can come from the implementation of Monte-Carlo methods. Trying the three packages with your data with a one-sided test for two independent variables:

DV <- c(x1, y1)
IV <- factor(rep(c("A", "B"), c(length(x1), length(y1))))
library(coin)                    # for oneway_test(), pvalue()
pvalue(oneway_test(DV ~ IV, alternative="greater", 
                   distribution=approximate(B=9999)))
[1] 0.00330033

library(perm)                    # for permTS()
permTS(DV ~ IV, alternative="greater", method="exact.mc", 
       control=permControl(nmc=10^4-1))$p.value
[1] 0.003

library(exactRankTests)          # for perm.test()
perm.test(DV ~ IV, paired=FALSE, alternative="greater", exact=TRUE)$p.value
[1] 0.003171822

To check the exact p-value with a manual calculation of all permutations, I'll restrict the data to the first 9 values.

x1 <- x1[1:9]
y1 <- y1[1:9]
DV <- c(x1, y1)
IV <- factor(rep(c("A", "B"), c(length(x1), length(y1))))
pvalue(oneway_test(DV ~ IV, alternative="greater", distribution="exact"))
[1] 0.0945907

permTS(DV ~ IV, alternative="greater", exact=TRUE)$p.value
[1] 0.0945907

# perm.test() gives different result due to rounding of input values
perm.test(DV ~ IV, paired=FALSE, alternative="greater", exact=TRUE)$p.value
[1] 0.1029412

# manual exact permutation test
idx  <- seq(along=DV)                 # indices to permute
idxA <- combn(idx, length(x1))        # all possibilities for different groups

# function to calculate difference in group means given index vector for group A
getDiffM <- function(x) { mean(DV[x]) - mean(DV[!(idx %in% x)]) }
resDM    <- apply(idxA, 2, getDiffM)  # difference in means for all permutations
diffM    <- mean(x1) - mean(y1)       # empirical differencen in group means

# p-value: proportion of group means at least as extreme as observed one
(pVal <- sum(resDM >= diffM) / length(resDM))
[1] 0.0945907

coin and exactRankTests are both from the same author, but coin seems to be more general and extensive - also in terms of documentation. exactRankTests is not actively developed anymore. I'd therefore choose coin (also because of informative functions like support()), unless you don't like to deal with S4 objects.

EDIT: for two dependent variables, the syntax is

id <- factor(rep(1:length(x1), 2))    # factor for participant
pvalue(oneway_test(DV ~ IV | id, alternative="greater",
                   distribution=approximate(B=9999)))
[1] 0.00810081

Solved – In R, why are results different for 1-tailed wilcoxsign_test, wilcox_test, and wilcox.exact with paired data

Your call for this test:

wilcox_test(y ~ condition | participant, data = data, alternative = "greater", 
   distribution = "approximate")

produces not a signed rank test (as the other calls do) but a rank sum test. It's a different thing for a different situation (independent samples, not paired). It tells you this in the output:

## Approximative Wilcoxon Mann-Whitney Rank Sum Test

So there's no way that should give a similar p-value to the other cases.

from your comment:

which indicates that it should handle handle pairwise comparisons as well.

It's possible that's a correct conclusion, but - whatever is needed to make it do that - you clearly didn't achieve it, as the output made clear: you didn't get a signed rank test.

The two exact signed rank tests did produce the same p-value, as one would hope.

With extremely small p-values you should not expect approximate methods to be highly accurate (close to the exact test values). They do all lead one to reject at any sensible significance level, which is about as much as you can ask as far as consistency goes.

As for differences between them, you'd have to look to exactly what has been implemented for each test - what approximations are made, what assumptions, and so on.

The last p-value (the pairwise comparison) doesn't seem to be a one-tailed test so it's hardly surprising it's about twice as large as the two above it.

Best Answer

Related Solutions

Permutation Tests – Which Permutation Test Implementation in R to Use Instead of T-Tests (Paired and Non-Paired)?

Solved – In R, why are results different for 1-tailed wilcoxsign_test, wilcox_test, and wilcox.exact with paired data

Related Question