Effect Size and Bootstrapping – Effect Size and Bootstrapping Methods in Paired t-Test

bootstrapeffect-sizepaired-datat-test

I have multiple paired $t$-tests, such as one giving results:

$t_{14} = 2.7,\ p = .017$

Although people seem to do effect sizes in different ways in repeated samples, I have taken the mean difference divided by the standard deviation of the differences (I'll call this $d$, though maybe I should call it something else?) and get $0.70$. I also have a very strong correlation between the samples, not sure if that is problematic.

I would like to put confidence limits around my effect size estimate. To do so, I randomly resample from the difference scores, compute $d$ in the same way and repeat 1000 times. My question is whether this is a good approach, rather than, say, just giving confidence limits around the unstandardised difference or resampling from the original samples. My bootstrap gives me a mean $d$ of $0.79$ with confidence limits of $[0.4, 1.4]$. I've tried this on other random data too. Why am I getting a consistently higher $d$ from bootstrapping, and why are the intervals asymmetric? Is this because of skew in the (difference) scores, and does this make this approach more or less robust?


Edit: here is an example of the data involved. 15 people were measured two times.

Mean A = 1742; SD = 435
Mean B = 1820; SD = 426
Mean difference = 78, SD of differences = 111, $d$ = 0.70

    A    B
 1999 2040
 1501 1601
 1552 1623
 2385 2386
 2488 2671
 1257 1218
 1806 1719
 1348 1405
 2048 2079
 1810 2017
 1308 1356
 2310 2324
 1247 1616
 1839 1878
 1235 1370

Best Answer

I will attempt to answer but I am not totally sure on my own knowledge on the subject.

Bootstrap, as far as I know is always done on the original data. In your case the original data is pairs of data. So to do a bootstrap, you would have to random sample (with replacement) on the pairs of the original data. That is equivalent to do the bootstrap on the difference scores and performing the effect size calculation as you described on the samples.

I get a different result from you (in R)

a=read.table(header=F,text="
1999 2040
1501 1601
1552 1623
2385 2386
2488 2671
1257 1218
1806 1719
1348 1405
2048 2079
1810 2017
1308 1356
2310 2324
1247 1616
1839 1878
1235 1370
")
d=a$V2-a$V1
mean(d)/sd(d)
[1] 0.7006464
aux=function(x,i) mean(x[i])/sd(x[i])
bb=boot::boot(d,aux,R=1000)
mean(bb$t)
[1] 0.7530415
boot::boot.ci(bb)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates

CALL : 
boot::boot.ci(boot.out = bb)

Intervals : 
Level      Normal              Basic         
95%   ( 0.1840,  1.0846 )   ( 0.1454,  1.0570 )      

Level     Percentile            BCa          
95%   ( 0.3443,  1.2559 )   ( 0.1634,  1.0722 )  
Calculations and Intervals on Original Scale
Some BCa intervals may be unstable

(code corrected as per the comments)

Indeed the direct calculation of the effect size (mean(d)/sd(d)) is not similar to the bootstrap calculation (mean(bb$t)). I dont know how to explain it

The only confidence interval that matches yours in the percentile (I dont really know which interval to choose on theoretical grounds - I use the BCa - I think it was suggested somewhere)

The second way to calculate a CI on effect size is to use analytical formulas. This question on CV discussed the formulas How can i calculate the 95% confidence interval of an effect size if I have the mean difference score, CI of that difference score

Using the MBESS package I get the following CI

MBESS::ci.sm(Mean = mean(d), SD=sd(d),N=length(d))
[1] "The 0.95 confidence limits for the standardized mean are given as:"
$Lower.Conf.Limit.Standardized.Mean
[1] 0.1231584

$Standardized.Mean
[1] 0.7006464

$Upper.Conf.Limit.Standardized.Mean
[1] 1.258396

As for your suggestion on computing the confidence interval for the difference score and using it to compute a confidence interval on the effect size, I have never heard of it, and I would suggest not using it.

Related Question