Bootstrap – Conducting a Bootstrap Two-Sample t-Test

bootstraprt-test

I'd like to bootstrap a two sample t-test. My DV is some psychological variable. I have two groups (women and men), unequal sizes and I do not assume equal variances. I'm not sure if my code or/and my thinking is correct, 'cause in the end I got 0 t-statistics greater than t-statistic from original data.

group_k  # women: N=377
group_m  # men:   N=306
t.est <- t.test(group_k, group_m, var.equal=FALSE)$stat
#        t 
# 5.659757

nullA <- group_k - mean(group_k, na.rm=T)
nullB <- group_m - mean(group_m, na.rm=T)
set.seed(1)
b <- function(){
  A <- sample(nullA, 200, replace=T)  # is 200-element from 377-element sample ok? 
  B <- sample(nullB, 200, replace=T) 
  stud_test <- t.test(A, B, var.equal=FALSE)
  stud_test$stat
}
t.stat.vect = vector(length=10000)
t.vect <- replicate(10000, b())

1 - mean(t.est>t.vect)
# [1] 0 :(

I have some additional questions:

  1. Why not bootstrapping simply differences between women and men?
  2. How to choose bootstrap sample size? In other words, is 200-elements from 377- and 306-element groups OK? Should they be 377 and 306, respectively, as this post recommends?

The idea behind subtracting means was here – gung's reply. I thought that it can be directly taken from ANOVA case to Student's t test.

[UPDATE 13XII]
I corrected my code, but results are still awkward to me:

t.est <- t.test(group_k, group_m, var.equal=FALSE)$stat
# t = 5.6598, df = 255.185, p-value = 4.066e-08

b <- function(){
  A <- sample(group_k, 377, replace=T)  
  B <- sample(group_m, 306, replace=T) 
  stud_test <- t.test(A, B, var.equal=FALSE)
  stud_test$stat
}
t.stat.vect = vector(length=10000)
t.vect <- replicate(10000, b())

1 - mean(t.est>t.vect)
[1] 0.5042

Is it possible that using original samples the difference between means is "so significant" (p-value = 4.066e-08), but the bootstrap samples shows that actually it's not (0.5042) ??

Best Answer

As @Tim notes, your bootsamples should have the same $n_j$s as your original data.

Next, recognize that there are several ways to bootstrap: e.g., you can bootstrap your data directly or bootstrap a test statistic, you can bootstrap your sampling distribution or a null distribution, etc. You need to make sure you understand which kind of thing you're doing. You can bootstrap simply the mean difference, if you want to. In the linked post, I bootstrapped the null distribution of the test statistic. That is essentially what you are doing in your code.

Also, because of the ways tests can differ, the bootstrapping strategy may need to be customized to the test you want to perform. In the linked post, I bootstrapped an $F$-statistic, but the way the $F$-test works is somewhat different from how a $t$-test works. Since you are bootstrapping the test statistic, you are somewhat safe from that.

In your case, think about the logic of the type of bootstrap you used. You bootstrapped a null sampling distribution for your $t$-statistic. Your observed $t$-statistic is so extreme that none of the bootstrapped $t$s overlapped with it. The implication of that is that the probability ($p$-value) of getting a $t$-statistic as far or further from $0$ from your bootstrapped null sampling distribution is $< (1/10000) / 2$. In other words, your result is highly significant. (However, you should re-do your bootstrap using the correct $n_j$s before you go with this result.)

Related Question