Solved – Paired or unpaired Wilcoxon test

paired-datawilcoxon-mann-whitney-testwilcoxon-signed-rank

I counted the number of geese on an intertidal mudflat on 100+ days over the winter. I made two counts on each of these days: one at low tide and one at high tide. I want to know if the number of geese present differs at high and low tide. As the data are very positively skewed, using a Wilcoxon test is appropriate. However, should I use rank sum test (unpaired test) or signed rank test (paired test)?

Best Answer

The days would appear to be the obvious pairing factor, suggesting a paired test.

Specifically, since you'd generally expect that the high tide/low tide geese count for a given day will tend to be more similar than high tide and low tide geese from two randomly selected days, the data are paired. The typical numbers of geese will tend to go up and down over time (as flocks come in or move on), which leads to that sort of dependence.

Pair-differences may not eliminate all of the inter-day correlation (you should probably check for that via some diagnostic - say a plot of high-low tide difference vs that for the previous day), but it will probably eliminate the major part of it.

Another issue of concern to me is the fact that you have counts. This introduces several features that tend to suggest that neither signed rank nor t-tests are fully suitable:

(i) the discreteness. The signed rank test relies on a continuous distribution of differences. This might be dealt with by simulating the null distribution, but is complicated by (ii)*;

(ii) the variance tends to be related to the mean. This heteroskedasticity will invalidate the t-test, tending to push the distribution more toward heavy tailedness in a way that's hard to quantify (since the result will be a scale mixture over an unknown mixing distribution).

* one possibility would be to do some simulations to quantify the likely impact on the null distribution simulated under one set of count assumptions (say close to the average counts) by trying a few plausible scenarios of varying counts. The actual impact may be quite small.

You have counts; you might be better working with them as counts. There are a number of ways this might proceed (it's possible to construct chi-squared tests, or you might fit some GLM with days as a blocking factor. If you want to treat the days as a random effect then it would be a mixed-effects GLM (aka GLMM).

Related Question