R – Mean Difference Analysis for Paired Data with Zero Inflation: Can Wilcoxon Test Work?

I have a set of data that represent the sum of white matter streamlines termination in 64 regions of interest in the brain surface collected from 40 subjects.
Below is an example of a paired data of one region for both right and left brain hemispheres. The variable "value" represents the sum of all streamlines that terminated within this region.

side <- c("Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Right","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left","Left")

value <- c(0,0,0,306,4,156,2,3,0,146,0,218,0,0,74,5,0,833,0,640,76,49,0,0,163,65,0,0,5,14,2,0,92,229,0,23,338,0,0,11,90,51,0,4,394,50,138,0,12,481,325,237,0,574,102,391,2,104,559,0,348,427,554,214,786,312,407,45,356,114,19,104,194,833,192,354,126,4,716,129)

data <- data.frame(side, value)

head(data)

   side value
1 Right     0
2 Right     0
3 Right     0
4 Right   306
5 Right     4
6 Right   156

As can be seen, several subjects ended with zero value, especially on the right side.
When looking into the distribution of the data as below, the right side data is highly skewed due to zero inflation.

ggplot(data, aes(x = value, color = side)) + 
  geom_histogram(aes(y=..density..), colour="black", fill="white") + 
                   geom_density(alpha=.2, fill="#FF6666")

These zero values are true zeros and have to be included in my analysis. It is normal for the brain areas to have zero values of certain white matter streamlines.

My purpose is to test the mean difference and to bring a parameter estimate or an effect size measurement to plot for visualization. My null hypothesis is that there is no significant difference between the two sides. Can the Wilcoxon signed-rank test still be used with such data? I am using R but can use SPSS if required.

# convert data to wide list format: > data <- split(value, list(side)) library(WRS2) > yuend(data$Left, data$Right, tr = 0.25) Call: yuend(x = data$Left, y = data$Right, tr = 0.25) Test statistic: 3.804 (df = 19), p-value = 0.0012 Trimmed mean difference: 179.9 95 percent confidence interval: 80.9156 278.8844 Explanatory measure of effect size: 0.69 # 25% trimmed mean group 'Left': > mean(data$Left, tr = 0.25) [1] 196.55 # 25% trimmed mean group 'Right': > mean(data$Right, tr = 0.25) [1] 16.65

Best Answer

Yes a paired wilcoxon test is inappropriate in this case. Since you want to test the mean but your data is highly skewed and I checked that 25 per cent of the values are outliers you could use a dependent Yuen's t-test on 25 per cent trimmed means. In R you can do this easily with the yuend function from the package WRS2:

So you see that we measure significantly lower values in the right group. Moreover, the effect size $\xi$ suggests a large effect.

Best Answer

Related Solutions

Solved – Paired or unpaired Wilcoxon test

Solved – Paired T test vs Wilcoxon signed-rank

Related Question