Effect Size – Understanding Pairwise Permutation and Multiple Comparisons

effect-sizemultiple-comparisonspaired-datapermutation

I am reading a paper that is using permutations to compare the means of two different treatment groups (a nutrition study, took minimum – maximum) that have low sample sizes, and so the groups are not normal, despite any transformation. Makes sense to me.

However, the table just reports p values. I'm not really familiar with pairwise permutation tests, but I was expecting some kind of effect size. Am I wrong, or can anyone point me in the right direction?

Best Answer

Using a permutation test doesn't necessarily preclude an effect size, although the connection between the test and the effect size may be broken. In general, a permutation test works by reshuffling the data and computing a statistic many times. The statistic could be something like a mean difference, or it could be a test statistic (e.g., $t$). Either way, this allows for an empirical estimate of the sampling distribution under the null. Comparing your statistic to the sampling distribution allows you to compute a $p$-value. A simple example of a permutation test can be seen in @jbowman's answer here: The z-test vs the χ2-test for comparing the odds of catching a cold in 2 groups. For a pairwise variant, the shuffling would only be within the pairs, but otherwise the principle is the same.

Related Solutions

Solved – How to estimate a standardized mean difference from two samples’ quartiles

If you are willing to assume that $Y$ has a symmetric distribution within the two groups, then the medians of the two groups (i.e., $q_{21}$ and $q_{22}$) could be used in place of the means. Furthermore, if you are willing to assume that $Y$ is normally distributed within the two groups, then you could make use of the relationship between the IQR and the SD for the normal distribution, namely, $SD \approx IQR / 1.35$. So, you can compute the two IQRs with $IQR_1 = q_{31} - q_{11}$ and $IQR_2 = q_{32} - q_{12}$, transform them to SDs, pool those two SDs in the usual manner, and then you have all of the pieces to compute the standardized mean difference.

Example: For your example data, this would be $$IQR_1 = 174 - 58 = 116$$ $$IQR_2 = 158 - 31 = 127,$$ so $$SD_1 = 116 / 1.35 = 85.93$$ $$SD_2 = 127 / 1.35 = 94.07.$$ Therefore, $$SD_p = \sqrt{\frac{(80-1)85.93^2 + (46-1)94.07^2}{80+46-2}} = 88.97.$$ And finally: $$d = \frac{85-79}{88.97} = 0.07$$ Now you could use the usual equation to estimate the sampling variance of $d$ (Hedges & Olkin, 1985): $$v = \frac{1}{80} + \frac{1}{46} + \frac{0.07^2}{2(80+46)} = 0.034.$$

Remarks: Under normality, $d$ should be an okay estimator of the true SMD. However, the use of medians in place of means and the estimation of the SDs via the IQRs involves a loss of precision. The usual equation for the sampling variance of $d$ does not reflect that, so it yields values that are probably too small (on average).

Also, the appropriateness of this method hinges on the symmetry/normality assumption. Unfortunately, authors typically choose to report medians and IQRs whenever they suspect that $Y$ has a non-normal/symmetric distribution. So, I would regard this method only as a rough approximation.

References:

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando: Academic Press.

Effect Size and Bootstrapping – Effect Size and Bootstrapping Methods in Paired t-Test

I will attempt to answer but I am not totally sure on my own knowledge on the subject.

Bootstrap, as far as I know is always done on the original data. In your case the original data is pairs of data. So to do a bootstrap, you would have to random sample (with replacement) on the pairs of the original data. That is equivalent to do the bootstrap on the difference scores and performing the effect size calculation as you described on the samples.

I get a different result from you (in R)

a=read.table(header=F,text="
1999 2040
1501 1601
1552 1623
2385 2386
2488 2671
1257 1218
1806 1719
1348 1405
2048 2079
1810 2017
1308 1356
2310 2324
1247 1616
1839 1878
1235 1370
")
d=a$V2-a$V1
mean(d)/sd(d)
[1] 0.7006464
aux=function(x,i) mean(x[i])/sd(x[i])
bb=boot::boot(d,aux,R=1000)
mean(bb$t)
[1] 0.7530415
boot::boot.ci(bb)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates

CALL : 
boot::boot.ci(boot.out = bb)

Intervals : 
Level      Normal              Basic         
95%   ( 0.1840,  1.0846 )   ( 0.1454,  1.0570 )      

Level     Percentile            BCa          
95%   ( 0.3443,  1.2559 )   ( 0.1634,  1.0722 )  
Calculations and Intervals on Original Scale
Some BCa intervals may be unstable

(code corrected as per the comments)

Indeed the direct calculation of the effect size (mean(d)/sd(d)) is not similar to the bootstrap calculation (mean(bb$t)). I dont know how to explain it

The only confidence interval that matches yours in the percentile (I dont really know which interval to choose on theoretical grounds - I use the BCa - I think it was suggested somewhere)

The second way to calculate a CI on effect size is to use analytical formulas. This question on CV discussed the formulas How can i calculate the 95% confidence interval of an effect size if I have the mean difference score, CI of that difference score

Using the MBESS package I get the following CI

MBESS::ci.sm(Mean = mean(d), SD=sd(d),N=length(d))
[1] "The 0.95 confidence limits for the standardized mean are given as:"
$Lower.Conf.Limit.Standardized.Mean
[1] 0.1231584

$Standardized.Mean
[1] 0.7006464

$Upper.Conf.Limit.Standardized.Mean
[1] 1.258396

As for your suggestion on computing the confidence interval for the difference score and using it to compute a confidence interval on the effect size, I have never heard of it, and I would suggest not using it.

Best Answer

Related Solutions

Solved – How to estimate a standardized mean difference from two samples’ quartiles

Effect Size and Bootstrapping – Effect Size and Bootstrapping Methods in Paired t-Test

Related Question