Bootstrap – Conducting a Bootstrap Two-Sample t-Test

bootstraprt-test

I'd like to bootstrap a two sample t-test. My DV is some psychological variable. I have two groups (women and men), unequal sizes and I do not assume equal variances. I'm not sure if my code or/and my thinking is correct, 'cause in the end I got 0 t-statistics greater than t-statistic from original data.

group_k  # women: N=377
group_m  # men:   N=306
t.est <- t.test(group_k, group_m, var.equal=FALSE)$stat
#        t 
# 5.659757

nullA <- group_k - mean(group_k, na.rm=T)
nullB <- group_m - mean(group_m, na.rm=T)
set.seed(1)
b <- function(){
  A <- sample(nullA, 200, replace=T)  # is 200-element from 377-element sample ok? 
  B <- sample(nullB, 200, replace=T) 
  stud_test <- t.test(A, B, var.equal=FALSE)
  stud_test$stat
}
t.stat.vect = vector(length=10000)
t.vect <- replicate(10000, b())

1 - mean(t.est>t.vect)
# [1] 0 :(

I have some additional questions:

Why not bootstrapping simply differences between women and men?
How to choose bootstrap sample size? In other words, is 200-elements from 377- and 306-element groups OK? Should they be 377 and 306, respectively, as this post recommends?

The idea behind subtracting means was here – gung's reply. I thought that it can be directly taken from ANOVA case to Student's t test.

[UPDATE 13XII]
I corrected my code, but results are still awkward to me:

t.est <- t.test(group_k, group_m, var.equal=FALSE)$stat
# t = 5.6598, df = 255.185, p-value = 4.066e-08

b <- function(){
  A <- sample(group_k, 377, replace=T)  
  B <- sample(group_m, 306, replace=T) 
  stud_test <- t.test(A, B, var.equal=FALSE)
  stud_test$stat
}
t.stat.vect = vector(length=10000)
t.vect <- replicate(10000, b())

1 - mean(t.est>t.vect)
[1] 0.5042

Is it possible that using original samples the difference between means is "so significant" (p-value = 4.066e-08), but the bootstrap samples shows that actually it's not (0.5042) ??

Best Answer

As @Tim notes, your bootsamples should have the same $n_j$s as your original data.

Next, recognize that there are several ways to bootstrap: e.g., you can bootstrap your data directly or bootstrap a test statistic, you can bootstrap your sampling distribution or a null distribution, etc. You need to make sure you understand which kind of thing you're doing. You can bootstrap simply the mean difference, if you want to. In the linked post, I bootstrapped the null distribution of the test statistic. That is essentially what you are doing in your code.

Also, because of the ways tests can differ, the bootstrapping strategy may need to be customized to the test you want to perform. In the linked post, I bootstrapped an $F$-statistic, but the way the $F$-test works is somewhat different from how a $t$-test works. Since you are bootstrapping the test statistic, you are somewhat safe from that.

In your case, think about the logic of the type of bootstrap you used. You bootstrapped a null sampling distribution for your $t$-statistic. Your observed $t$-statistic is so extreme that none of the bootstrapped $t$s overlapped with it. The implication of that is that the probability ($p$-value) of getting a $t$-statistic as far or further from $0$ from your bootstrapped null sampling distribution is $< (1/10000) / 2$. In other words, your result is highly significant. (However, you should re-do your bootstrap using the correct $n_j$s before you go with this result.)

Related Solutions

Bootstrap Test – How to Compare the Means of Two Samples

I would just do a regular bootstrap test:

compute the t-statistic in your data and store it
change the data such that the null-hypothesis is true. In this case, subtract the mean in group 1 for group 1 and add the overall mean, and do the same for group 2, that way the means in both group will be the overall mean.
Take bootstrap samples from this dataset, probably in the order of 20,000.
compute the t-statistic in each of these bootstrap samples. The distribution of these t-statistics is the bootstrap estimate of the sampling distribution of the t-statistic in your skewed data if the null-hypothesis is true.
The proportion of bootstrap t-statistics that is larger than or equal to your observed t-statistic is your estimate of the $p$-value. You can do a bit better by looking at $($the number of bootstrap t-statistics that are larger than or equal to the observed t-statistic $+1)$ divided by $($the number of bootstrap samples $+1)$. However, the difference is going to be small when the number of bootstrap samples is large.

You can read more on that in:

Chapter 4 of A.C. Davison and D.V. Hinkley (1997) Bootstrap Methods and their Application. Cambridge: Cambridge University Press.
Chapter 16 of Bradley Efron and Robert J. Tibshirani (1993) An Introduction to the Bootstrap. Boca Raton: Chapman & Hall/CRC.
Wikipedia entry on bootstrap hypothesis testing.

Hypothesis Testing – Two Methods of Using Bootstrapping to Test the Difference Between Two Sample Means

The issue is that your bootstrap in boot_t_B isn't correctly done. If you're not correcting the means to be the same (i.e., forcing the null hypothesis to be true by re-centering each sample), you force the null hypothesis to be true by sampling from the two samples combined:

boot.c <- sample(c(x1,y1), size=length(x), replace=T)
boot.p <- sample(c(x1,y1), size=length(y), replace=T)

The reason for this is that if the means ARE different, in your original formulation boot.c and boot.p are actually samples from the alternative hypothesis where the alternative distributions are "centered" at the data. You can think of it as bootstrap sampling from the alternative distribution that is most likely given the data, only you're being nonparametric instead of using a parametric bootstrap. Consequently, you don't get p-values, which of course are calculated assuming the null hypothesis.

If you do it this way, you get:

> set.seed(1678)
> boot_t_B(rnorm(25,0,10), rnorm(25,5,10))
[1] 0.05
> set.seed(1678)
> boot_t_F(rnorm(25,0,10), rnorm(25,5,10))
[1] 0.0507

Best Answer

Related Solutions

Bootstrap Test – How to Compare the Means of Two Samples

Hypothesis Testing – Two Methods of Using Bootstrapping to Test the Difference Between Two Sample Means

Related Question