Solved – Why is the bootstrap function for paired samples t test in R not returning the same result as SPSS

bootstrappaired-data

I want to verify using a bootstrap approach the hypothesis about a lack of the intervention effect within a group of six patients: pre-(X) post(Y) intervention measurement. My data

ID      X      Y
1  9.856  8.992
2 19.512  4.573
3  1.936  1.572
4 14.575  1.529
5  8.476 12.000
6  1.862  1.417

Within R (2.15.1) Ive written the following code, using the t-test for paired data, which relies on resampled pairs:

boot.p.value <- function(data, S)
{
    boot.t.stat <- as.numeric()
    t.stat <- t.test(x=data[,1], y=data[,2], paired=TRUE)$statistic
        for(s in 1:S)
        {
            boot.data <- data[sample(1:nrow(data), replace=TRUE),] ## resample pairs
            boot.t.stat[s] <- t.test(x=boot.data[,1],y=boot.data[,2], paired=TRUE)$statistic
    }
    p.value <- sum(1*(boot.t.stat >= t.stat))/S
    return(p.value)
}

Where:

boot.p.value(data, S=1000)
[1] 0.518

When repeated the resulting p-values values stay between .4 and .6.

For the same data set the SPSS ver. 19 for the paired samples t-test provides bootstrap-based p = 0.182, for 1000 resamples. Why this difference?

Best Answer

Your bootstrap function is not correct.

I know why all of your p values are between 0.4 and 0.6 and are averaging 0.5: half of your resamples give you a test statistic below and half of your resamples give you a test statistic above the original. You will always get that result from that function - I tried it out with some other data. You aren't randomly switching up the pre and post data.

To get the bootstrap p value, you compare your observed test statistic,

 t.test(x=data[,1], y=data[,2], paired=TRUE)$statistic 

with a random shuffling of pre and post data. So, you need to sample from your original data AND randomly mix up pre and post data (maintaining pairs though).

I'll try to post some code later if you still need help.

Related Question