R – Wilcoxon Signed-Rank Test with an Additional Covariate

confoundingnonparametricrstatistical significancet-test

Problem formulation

I have 5 strains on which I administered treatment and repeated the experiment two times independently. So my data looks something like this:

strain <- factor(1:5)    
ctrl <- c(1, 1, 1, 1, 1)
trt1 <- c(60.82812657, 6.175973947, 7.395597282, 203.6573398, 24.42014734)
trt2 <- c(758.3219468, 26.84658479, 43.21119828, 103.2501452, 9.557721081)

with the output

strain ctrl       trt1       trt2
     1    1  60.828127 758.321947
     2    1   6.175974  26.846585
     3    1   7.395597  43.211198
     4    1 203.657340 103.250145
     5    1  24.420147   9.557721

The values are normalised to the control (that's why the controls are always 1). I want put statistical significance to the question: "Does the treatment increase the measurement values?"

Problem:
What is the most appropriate test to answer this question, given this specific structure of the experiment?

What I have tried so far

Ignoring the strain structure

The values don't seem to be normally distributed, so I thought a non-parametric test would be more appropriate. If I perform a Wilcoxon t-ttest individually for each treatment group

wilcox.test(ctrl, trt1)
wilcox.test(ctrl, trt2)

I get a p-value of 0.007495 each (together with an warning message: “cannot compute exact p-value with ties”).
Pooling the two treatments together increases the significance level as expected, i.e.

wilcox.test(ctrl, c(trt1, trt2))

yields a p-value of 0.002245 (together with a warning message: “cannot compute exact p-value with ties”).

Including the strain structure

What came as a surprise to me is that the paired version of this test is not statistically significant anymore, e.g. the following code

wilcox.test(ctrl, trt1, paired = TRUE)
wilcox.test(ctrl, trt2, paired = TRUE)

yields a p-value of 0.0625 each (without a warning message). Now I would like to pool the treatment groups together (similar to the case above) but still somehow retain the strain-structure of the experiment. But this is where I am stuck right now.

Anyone knows if I am on the right track, or how to include the categorical variable strain as some sort of covariate into a non-parametric statistical model?

Best Answer

As mentioned in the comments, since your data is normalized to the control, you essentially have a one-sample problem, comparing the treatment data to the control value of 1.

You could conduct analyses for each treatment separately, with, for example, a one-sample t-test (if appropriate), one-sample Wilcoxon test, or one-sample sign test. As mentioned in the comments with five observations per treatment, these analyses are not likely to be very helpful.

But as you mention, you'd like to consider the two treatments together with the acknowledgement that they are paired by Strain.

For this particular case, here's what I might do. This isn't necessarily an authoritative answer, but it makes sense to me given the limitations of the study.

Log transform the observations. This may make sense depending on how you normalized the values. But it also makes the observations relatively symmetric, makes the distribution of values easier to examine.
Construct a model whereby the structure of the observations being paired by Strain.
From there, construct confidence intervals for the estimated marginal means of the treatments. If these confidence intervals don't include 0 (log(1)=0), when you have evidence that the e.m. for the treatment is different from the control.

3b) As it turns out, for these data, it won't matter much if you ignore Strain. You could simply do the analysis by just calculating the mean and confidence intervals for the log of the observations.

3c) I decided to use a mixed-effects model with Strain as a random effect. Since there are only five levels of this variable, it may not be the most justified approach to treat this as a random effect.

These are the results of this approach. All values are log-transformed. The upshot: The confidence interval for the e.m. means do not include 0, so this suggests the values for the treatments differ from the control. But the confidence intervals for the treatments overlap quite a bit, suggesting that the two treatments aren't different.

 Treatment  emmean    SE   df lower.CL upper.CL
 Treatment1   3.29 0.698 6.35     1.60     4.97
 Treatment2   4.12 0.698 6.35     2.43     5.80

As mentioned, just ignoring Strain, and calculating the confidence intervals for the means of the logged values is similar. Here, these are done by bootstrap.

   Treatment n Mean Conf.level Percentile.lower Percentile.upper
1 Treatment1 5 3.29       0.95             2.17             4.47
2 Treatment2 5 4.12       0.95             2.97             5.49

The R code I used:

Data = read.table(header=TRUE, stringsAsFactors=TRUE, text="
strain ctrl       trt1       trt2
     1    1  60.828127 758.321947
     2    1   6.175974  26.846585
     3    1   7.395597  43.211198
     4    1 203.657340 103.250145
     5    1  24.420147   9.557721
")  

Y = c(Data$trt1, Data$trt2)
Treatment = factor(c(rep("Treatment1", 5), rep("Treatment2", 5)))
Strain   = factor(c(1:5, 1:5))
Data2 = data.frame(Strain, Treatment, Y, LogY = log(Y))

plot(log(Y)~Treatment, data=Data2)

library(lme4)

library(lmerTest)

model = lmer(LogY ~ Treatment + (1|Strain), data=Data2)

hist(residuals(model))

plot(residuals(model) ~ predict(model))

anova(model)

library(emmeans)

emmeans(model, ~ Treatment)

library(rcompanion)

groupwiseMean(LogY ~ Treatment, data=Data2, percentile=TRUE)

Related Solutions

Solved – How to analyze these data

Sometimes a formal statistical test is overkill. Row by row, the entries in the first column are the largest. Draw a picture to make this apparent: side-by-side boxplots or dotplots would work nicely.

Although this is a post-hoc comparison, if the initial intent had been to compare the first column against the rest for a shift in distribution, the most extreme characterizations would be that either all maxima or all minima occur in the first column (a two-sided test). The chance of this occurring by chance, if all columns contained values drawn at random from a common distribution, would be $2 (\frac{1}{6})^7$ = about 0.0007%.

In fact, the first two contains the largest 7 of the 42 values. Again, ex post facto, the chance of such an extreme ordering occurring equals $\frac{2}{42 \choose 7}$ = about 0.000007%.

These results indicate that any reasonably powerful test you choose to conduct will conclude there's a highly significant difference.

In any event, You don't need a p-value; you need to characterize how large the difference is (the right way to do this depends on what the data mean) and you need to seek an explanation for the difference.

Solved – Coding of categorical random effects in R: int vs factor

The difference is because you're separating the intercept and the slope in the random effect. That's an odd thing to do; the usual way to fit this model would be

OK ~ multi + (multi | item) + (1 | subject)

with multi being a factor.

What happens is that in the first model you get what you expect; the 0+multi|item term gives one parameter and the 1|item term gives one parameter, but in the second model the 0 + multi | item term results in two parameters, which are simply the estimate for each condition. If you take the 1|item term out of that model you should get a result that is equivalent to both your first model and the one I give above, except for differences in parameterization.

Note also the correlation of exactly one in your second model; this is a clue that you've overparameterized it and that one of those parameters is not necessary.