Solved – simulating a multiple comparisons problem using R and bonferroni correction

multiple-comparisonsrsimulation

I am actually working on an exercise for my students dealing with multiple comparisons. However, I was having trouble trying to simulate some data in R to demonstrate the problem with multiple comparisons. I was hoping someone might have some R code or at least give some guidance on how to generate a dataset in R to demonstrate the multiple comparisons problem. So first I would like to generate some data (rnorm(), runif, etc.). Then I would like to show how if I pairwise compare the parameters for 2 levels of a multiple level predictor, I am more likely to get a significant result than the standard alpha level of 0.5, or 95% confidence.

Thanks.

Best Answer

Let us simulate $100$ realisations of $N(0, 1)$ and test the null hypothesis $H_0: \mu=0$ with the t-test. If this is done many times, $H_0$ should be rejected approximately $5\%$ of the times.

pval <- rep(NA, 1e4)
for(k in 1:1e4)
{
  x <- rnorm(100)
  pval[k] <- t.test(x=x, mu=0)$p.value
}

> mean(pval < 0.05)
[1] 0.0495

$$ $$

If we now simulate $5$ random variables and test the null hypothesis that all means are simultaneously $0$, then the probability of at least one significant result is larger than $5\%$; actually it is $1 - (1 - 0.05)^5 = 0.226$.

n <- 5
pval <- matrix(NA, nrow=1e4, ncol=n)
for(k in 1:1e4)
{
  X <- replicate(n=n, expr=rnorm(100))
  pval[k, ] <- apply(X=X, MARGIN=2, FUN=function(x){
    t.test(x=x, mu=0)$p.value
  })  
}

> mean(sapply(X=1:nrow(pval), FUN=function(i){
+   any(pval[i, ] < 0.05)
+ }))
[1] 0.2203

$$ $$

But if you use the Bonferroni correction, then

> mean(sapply(X=1:nrow(pval), FUN=function(i){
+   any(pval[i, ] < 0.05 / n)
+ }))
[1] 0.0487
Related Question