Solved – Why can bigger sample size increase power of a test

hypothesis testing

From Wikipedia

The sample size determines the amount of sampling error inherent in a
test result. Other things being equal, effects are harder to detect in
smaller samples. Increasing sample size is often the easiest way to
boost the statistical power of a test.

I wonder why it is often said that a bigger sample size can increase the power (i.e. true positive rate) of a test in general.
Does bigger sample size always increase testing power?

Added: Suppose at each sample size $n$, reject null iff $T_n(X) \geq c_n$. How power changes with $n$ depends on how $T_n$ and $c_n$ are defined in terms of $n$, doesn't it? Even if $c_n$ is chosen so that the size of the testing rule is a value $\alpha \in [0,1]$ fixed for all $n$ values, will the power necessarily increase with $n$?

Explanations that are rigorous and intuitive are both welcome.

Thanks!

Best Answer

The power of the test depends on the distribution of the test statistic when the null hypothesis is false. If $R_n$ is the rejection region for the test statistic under the null hypothesis and for sample size $n$, the power is $$\beta = \mbox{Prob}(X_n \in R_n | H_A)$$ where $H_A$ is the null hypothesis and $X_n$ is the test statistic for a sample of size $n$. I am assuming a simple alternative --- although in practice, we usually care about a range of parameter values.

Typically, a test statistic is some sort of average whose long term behaviour is governed by the strong and/or weak law of large numbers. As the sample size gets large, the distribution of the test statistic approaches that of a point mass --- under either the null or the alternative hypotheses.

Thus, as $n$ gets large, the acceptance region (complement of the rejection region), gets smaller and closer to the value of the null. Intuitively, probable outcomes under the null and probable outcomes under the alternative no longer overlap - meaning that the rejection probability approaches 1 (under $H_A$) and 0 under $H_0$. Intuitively, increasing sample size is like increasing the magnification of a telescope. From a distance, two dots might seem indistinguishably close: with the telescope, you realize there is space between them. Sample size puts "probability space" between the null and alternative.

I am trying to think of an example where this does not occur --- but it is hard to imagine oneself using a test statistic whose behaviour does not ultimately lead to certainty. I can imagine situations where things don't work: if the number of nuisance parameters increases with sample size, things can fail to converge. In time series estimation, if the series is "insufficiently random" and the influence of the past fails to diminish at a reasonable rate, problems can arise as well.

Related Question