Solved – Wilcoxon signed-rank test for proportion variable response

nonparametricpaired-comparisonsproportion;wilcoxon-signed-rank

My response variable is a proportion.

The explanatory variable is categorical with two levels which are not independent.

The distribution of the response variable is different from normal.

Therefore, I was thinking in using a paired Wilcoxon signed-rank test, however I am concerned about the fact that the response variable is a proportion, it can be any number between 0 and 1 only.

Can I use the Wilcoxon test?

Best Answer

  1. I'd suggest that you use a test that is designed for such count data (such as some form of chi-squared test, or a binomial GLM). Within each tumor type it sounds like you have what's in effect a 2x2 contingency table:

                   stable   not-stable  
     wild-type
     mutant-type
    

    You could also cast it as trying to combine a series of two sample proportions comparisons.

    The question of the best way to test the overall hypothesis (i.e. to combine the information across tumor types) would depend on the precise form of the null and alternative hypothesis you're interested in testing; this is not clear from the question.

    For example if you're interested in detecting the case where there's a higher proportion of stable bindings with the mutant type in one tumor type and a lower proportion in another tumor type that would be different than if you wanted to pick up the cases where all differences in proportion tended to be in the same direction (this speaks to the kind of alternative you want power against). You haven't made your hypotheses explicit enough to differentiate those cases.

  2. The fact that the proportions are limited to [0,1] isn't of itself an issue. However, the Wilcoxon signed rank test comes with some assumptions and other potential issues; there are some particular ones I'll discuss:

    • under the null, you need the distribution of pair differences across pairs to be such that each rank is equally likely to get either sign (e.g. if the distribution of pair differences is the same across pairs, that would suffice; while I doubt that more specific assumption would hold, the broader assumption you need might be okay)

    • under the null, you need the distribution of pair differences to be symmetric (this shouldn't be a major issue).

    • if you have discrete data, the "standard" calculations designed for continuous distributions don't apply (if using the exact distribution of the statistic it needs to account for the impact of tied ranks on the distribution; if using the normal approximation the variance must be adjusted for ties). This won't stop you using it, but it's something to keep in mind for some software.

    It may be okay to use a signed rank test as long as it relates to a hypothesis you actually want to test; as I mentioned under 1., you haven't clearly identified what you're specifically interested in finding out.

A Wilcoxon signed rank test (with the above caveats) would have power against alternatives where the proportion-differences tended to be in the same direction; if that's the case you may be better to consider a binomial GLM which has a factor for wild-vs-mutant in order to detect a shift in the log-odds for the wild-vs-mutant comparison. On the other hand if you're interested in differences that may run in different directions across tumor types a 2x2xk chi-square might be reasonable (as might a GLM which had a tumor-type by wild-vs-mutant interaction in it), but a Wilcoxon signed rank test would not work for that case.