Solved – Is exact power analysis of the permutation (or randomization) test possible without i.i.d assumptions on the data

hypothesis testingiidindependencestatistical-power

In hypothesis testing, we have the null hypothesis ($H_0$) and the alternate hypothesis ($H_1$). The null hypothesis typically states that units drawn from the two groups have identical outcomes, whereas the alternate states that they differ. The Permutation test rejects the null $H_0$ if the computed p-value is less than the significance criterion (say $\alpha$). The permutation test itself does not require assumptions of i.i.d, rather a weaker assumption of exchangeability under the null hypothesis is enough for its application.

Power is P[rejection | $H_1$]. Is it possible to compute power without making i.i.d assumptions? I understand that distributional assumptions (like normality) will make it easier to compute the power, but I'm trying to understand the minimum set of assumptions needed. For example, if one assumes independence and the data is binary, a particular form of $H_1$ makes the outcomes follow the Binomial distribution. For example, $H_1: p_1=0.4, p_2=0.6$, where $p_i$ is the probability of a unit in the $i^{th}$ group getting outcome $1$. Thus, with just the independence assumption, power can be computed for the specific form of $H_1$.

I can see the assumption of ‘identically distributed’ may be necessary. If the alternative hypothesis is indeed false, we cannot draw any conclusions about the distribution of measurements in the sample unless the measurements are identically distributed. One example of non-identically distributed outcomes is that outcomes are drawn such that all units get $1$ – leading to a certain failure in rejecting the null. Unless $H_1$ is so extreme that it makes it impossible for one group to get the outcome $1$, this drawing is possible. With identically distributed measurements, this particular drawing is still possible, but it has a low probability (if the alternate is true), thereby leading to a rejection of the null.

But, is independence necessary?

Edit: Monte Carlo simulations are a way to estimate power, but that requires model assumptions on how real world data is generated. I reworded the question to indicate that we are interested in an exact power computation.

Best Answer

The short answer is yes. You will need to go about this with a simulation study. The only wrinkle to it is that you will need to make specific assumptions about the error distribution, the nature of the dependence, heterogeneity or whatever violation may occur when conducting a power analysis. This limits what you can say about the power of the tests in general. When you actually go about collecting data, the whole concept of a "data generating mechanism" goes out the window. But simulating a variety of scenarios is useful for explaining a test's possible limitations (or lack thereof).

It is a fault of classical statistics that assumptions are taught so dogmatically. Statistical tests may be well applied when assumptions have been violated. As an analyst, your responsibility is to report the findings from these tests, and discuss the possible limitations that may arise. As a statistician conducting a power analysis, your responsibility is to anticipate a variety of scenarios where assumptions are violated and make recommendations based on prior subject matter knowledge to recommend a test that's most general (not necessarily most powerful).

When you set up a simulation experiment to demonstrate statistical power for incorrectly applied tests, it is usually useful to report the absolute relative efficiency (ARE) for the "right" statistical test. For instance, if model misspecification is happening, or heteroscedasticity, or distributional violations occur, there is a correct test that may be applied based on the data generating mechanism that you have set up. An ARE of 1 shows the user that the "incorrect" test is just as good as the perfect one. Many statisticians and researchers would prefer a more "general" test that can be applied in many situations to a "maximum power" test that fails absolutely when assumptions are violated. This is the statistical notion of risk. For instance, a Cox proportional hazards model may have AREs as low as 0.3 to parametric survival models, but it's ability to accommodate a wide variety of baseline survival functions is what's preferred by researchers and statisticians.

Related Question