R – Mean Difference Analysis for Paired Data with Zero Inflation: Can Wilcoxon Test Work?

nonparametricpaired-datarwilcoxon-signed-rankzero inflation

I have a set of data that represent the sum of white matter streamlines termination in 64 regions of interest in the brain surface collected from 40 subjects.
Below is an example of a paired data of one region for both right and left brain hemispheres. The variable "value" represents the sum of all streamlines that terminated within this region.

side <- c("Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left")

value <- c(0,0,0,306,4,156,2,3,0,146,0,218,0,0,74,5,0,833,0,640,76,49,0,0,163,65,0,0,5,14,2,0,92,229,0,23,338,0,0,11,90,51,0,4,394,50,138,0,12,481,325,237,0,574,102,391,2,104,559,0,348,427,554,214,786,312,407,45,356,114,19,104,194,833,192,354,126,4,716,129)

data <- data.frame(side, value)

head(data)

   side value
1 Right     0
2 Right     0
3 Right     0
4 Right   306
5 Right     4
6 Right   156

As can be seen, several subjects ended with zero value, especially on the right side.
When looking into the distribution of the data as below, the right side data is highly skewed due to zero inflation.

ggplot(data, aes(x = value, color = side)) + 
  geom_histogram(aes(y=..density..), colour="black", fill="white") + 
                   geom_density(alpha=.2, fill="#FF6666")

enter image description here

These zero values are true zeros and have to be included in my analysis. It is normal for the brain areas to have zero values of certain white matter streamlines.

My purpose is to test the mean difference and to bring a parameter estimate or an effect size measurement to plot for visualization. My null hypothesis is that there is no significant difference between the two sides. Can the Wilcoxon signed-rank test still be used with such data? I am using R but can use SPSS if required.

Best Answer

Yes a paired wilcoxon test is inappropriate in this case. Since you want to test the mean but your data is highly skewed and I checked that 25 per cent of the values are outliers you could use a dependent Yuen's t-test on 25 per cent trimmed means. In R you can do this easily with the yuend function from the package WRS2:

# convert data to wide list format:

> data <- split(value, list(side))

library(WRS2)

> yuend(data$Left, data$Right, tr = 0.25)
Call:
yuend(x = data$Left, y = data$Right, tr = 0.25)

Test statistic: 3.804 (df = 19), p-value = 0.0012

Trimmed mean difference:  179.9 
95 percent confidence interval:
80.9156     278.8844 

Explanatory measure of effect size: 0.69 

# 25% trimmed mean group 'Left':

> mean(data$Left, tr = 0.25)
[1] 196.55


# 25% trimmed mean group 'Right':
> mean(data$Right, tr = 0.25)
[1] 16.65

So you see that we measure significantly lower values in the right group. Moreover, the effect size $\xi$ suggests a large effect.

Related Question