Hypothesis Testing Large Data – Are Large Data Sets Inappropriate for Hypothesis Testing?

datasethypothesis testinglarge datasample-sizestatistical significance

In a recent article of Amstat News, the authors (Mark van der Laan and Sherri Rose) stated that "We know that for large enough sample sizes, every study—including ones in which the null hypothesis of no effect is true — will declare a statistically significant effect.".

Well, I for one didn't know that. Is this true? Does it mean that hypothesis testing is worthless for large data sets?

Best Answer

It is not true. If the null hypothesis is true then it will not be rejected more frequently at large sample sizes than small. There is an erroneous rejection rate that's usually set to 0.05 (alpha) but it is independent of sample size. Therefore, taken literally the statement is false. Nevertheless, it's possible that in some situations (even whole fields) all nulls are false and therefore all will be rejected if N is high enough. But is this a bad thing?

What is true is that trivially small effects can be found to be "significant" with very large sample sizes. That does not suggest that you shouldn't have such large samples sizes. What it means is that the way you interpret your finding is dependent upon the effect size and sensitivity of the test. If you have a very small effect size and highly sensitive test you have to recognize that the statistically significant finding may not be meaningful or useful.

Given some people don't believe that a test of the null hypothesis, when the null is true, always has an error rate equal to the cutoff point selected for any sample size, here's a simple simulation in R proving the point. Make N as large as you like and the rate of Type I errors will remain constant.

# number of subjects in each condition
n <- 100
# number of replications of the study in order to check the Type I error rate
nsamp <- 10000

ps <- replicate(nsamp, {
    #population mean = 0, sd = 1 for both samples, therefore, no real effect
    y1 <- rnorm(n, 0, 1) 
    y2 <- rnorm(n, 0, 1)
    tt <- t.test(y1, y2, var.equal = TRUE)
    tt$p.value
})
sum(ps < .05) / nsamp

# ~ .05 no matter how big n is. Note particularly that it is not an increasing value always finding effects when n is very large.