Solved – Bootstrapping a t-test in R

bootstraprresamplingt-test

I have two groups of individuals (22 in each group), which I compared using a t-test. The difference between groups was non-significant (p = .17). Because the p-value was quite low, my supervisor suggested I bootstrap the t-test, to get a sense of the distribution of t-values and p-values (we are concerned we are making a type-II error, given the small sample sizes).

So I want to: a) randomly extract (with replacement) 22 cases for each of my two groups, b) perform a t-test on these new samples, c) repeat e.g. 1000 times, d) look/plot the distribution of t and p values. So far I have been using the boot package. For a single vector (Data) with 22 components, I got the boot package to bootstrap the mean value using:

MeanFunction <- function(x, i) {
    mean(x[i[1:22]])
}
Results <- boot(Data, MeanFunction, R = 100)

However, my problem is to expand this logic to a dataframe, and using a t-test instead of a mean function. In my mind, I have to use the "strata" argument to ensure I am bootstrapping the two samples separately, and to nest the t-test within a function. However, all attempts to do so have been unsuccessful. Below is the actual dataset ("white_matter", in a dataframe):

   Control Patient
1   0.3329  0.3306
2   0.3458  0.3375
3   0.3500  0.3874
4   0.3680  0.3485
5   0.3421  0.3548
6   0.3403  0.3876
7   0.3447  0.3755
8   0.3330  0.3644
9   0.3450  0.3206
10  0.3764  0.3587
11  0.3646  0.3570
12  0.3482  0.3423
13  0.3734  0.3583
14  0.3436  0.3457
15  0.3348  0.3770
16  0.3553  0.3419
17  0.3281  0.3416
18  0.3567  0.3703
19  0.3390  0.3525
20  0.3287  0.3596
21  0.3603  0.3519
22  0.3533  0.3443

Applying the same procedure from the simple example code above; I'm thinking that I need to do something in the lines of:

TtestFunction <- function(x, i) {
    t.test(x[i[??
}
Results <- boot(white_matter, TtestFunction, R = 1000, strata = white_matter$Control ??

But as the question marks imply, I am not sure how to complete the code (and is my basic setup of the code even correct?).

Best Answer

I've never used the boot package. Bootstrapping is so trivial you can just code it from scratch. Below, I just use t.test() with the defaults; you can choose var.equal=T, alternative="greater", etc., if you'd like. I set the seed, so your results would be identical, if you don't do anything different. For the qq-plot for the t-distribution, I used the df that corresponds to equal variances, which won't quite match the bootstrap (where each iteration will have a different effective df). Under the null, p-values should be uniformly distributed, but yours clearly aren't. I'm not sure I'd draw any conclusions from that, though.

library(car)
white_matter <- read.table(text="   Control Patient
1   0.3329  0.3306
2   0.3458  0.3375
3   0.3500  0.3874
4   0.3680  0.3485
5   0.3421  0.3548
6   0.3403  0.3876
7   0.3447  0.3755
8   0.3330  0.3644
9   0.3450  0.3206
10  0.3764  0.3587
11  0.3646  0.3570
12  0.3482  0.3423
13  0.3734  0.3583
14  0.3436  0.3457
15  0.3348  0.3770
16  0.3553  0.3419
17  0.3281  0.3416
18  0.3567  0.3703
19  0.3390  0.3525
20  0.3287  0.3596
21  0.3603  0.3519
22  0.3533  0.3443", header=T)

set.seed(1315)
B      <- 1000
t.vect <- vector(length=B)
p.vect <- vector(length=B)
for(i in 1:B){
  boot.c <- sample(white_matter$Control, size=22, replace=T)
  boot.p <- sample(white_matter$Patient, size=22, replace=T)
  ttest  <- t.test(boot.c, boot.p)
  t.vect[i] <- ttest$statistic
  p.vect[i] <- ttest$p.value
}

windows()
  qqPlot(t.vect, distribution="t", df=42)

enter image description here

windows()
  qqPlot(p.vect, distribution="unif")

enter image description here