Permutation Test – What Are the Assumptions?

hypothesis testingpermutation-testresampling

It's often stated that permutation tests have no assumptions, however this is certainly not true. For example if my samples are somehow correlated, I can imagine that permuting their labels would not be the correct thing to do. Only think I found about this problem is this sentence from Wikipedia: "An important assumption behind a permutation test is that the observations are exchangeable under the null hypothesis." Which I don't understand.

What are the assumptions of permutation tests? And how are these assumptions connected to different possible permutation schemes?

Best Answer

The literature distinguishes between two types of permutations tests: (1) the randomization test is the permutation test where exchangeability is satisfied by random assignment of experimental units to conditions; (2) the permutation test is the exact same test but applied to a situation where other assumptions (i.e., other than random assignment) are needed to justify exchangeability.

Some references regarding the naming conventions (i.e., randomization vs permutation): Kempthorne & Doerfler, Biometrika, 1969; Edgington & Onghena, Randomization Tests, 4th Ed., 2007]

For assumptions, the randomization test (i.e., Fisher's randomization test for experimental data) only requires what Donald Rubin refers to as the stable unit treatment value assumption (SUTVA). See Rubin's 1980 comment on Basu's paper in JASA. SUTVA is also one of the fundamental assumptions (along with strong ignorability) for causal inference under the Neyman-Rubin potential outcomes model (cf. Paul Holland's 1986 JASA paper). Essentially, SUTVA says that there is no interference between units and that the treatment conditions are the same for all recipients. More formally, SUTVA assumes independence between the potential outcomes and the assignment mechanism.

Consider the two-sample problem with participants randomly assigned to a control group or a treatment group. SUTVA would be violated if, for example, two study participants were acquainted and the assignment status of one of them exerted some influence on the outcome of the other. This is what is meant by no interference between units.

The above discussion applies to the randomization test wherein participants were randomly assigned to groups. In the context of a permutation test, SUTVA is also necessary, but it may not rest on the randomization because there was none.

In the absence of random assignment, the validity of permutation tests may rely on distributional assumptions like identical shape of distribution or symmetric distributions (depending on the test) to satisfy exchangeability (see Box and Anderson, JRSSB, 1955]).

In an interesting paper, Hayes, Psych Methods, 1996, shows through simulation how Type I error rates may become inflated if permutation tests are used with non-randomized data.

Related Solutions

Solved – How do we create a confidence interval for the parameter of a permutation test

It's OK to use permutation resampling. It really depends on a number of factors. If your permutations are a relatively low number then your estimation of your confidence interval is not so great with permutations. Your permutations are in somewhat of a gray area and probably are fine.

The only difference from your prior code is that you'd generate your samples randomly instead of with permutations. And, you'd generate more of them, let's say 1000 for example. Get the difference scores for your 1000 replications of your experiment. Take the cutoffs for the middle 950 (95%). That's your confidence interval. It falls directly from the bootstrap.

You've already done most of this in your example. dif.treat is 462 items long. Therefore, you need the lower 2.5% and upper 2.5% cut offs (about 11 items in on each end).

Using your code from before...

y <- sort(dif.treat)
ci.lo <- y[11]
ci.hi <- y[462-11]

Off hand I'd say that 462 is a little low but you'll find a bootstrap to 10,000 comes out with scores that are little different (likely closer to the mean).

Thought I'd also add in some simple code requiring the boot library (based on your prior code).

diff <- function(x,i) mean(x[i[6:11]]) - mean(x[i[1:5]])
b <- boot(total, diff, R = 1000)
boot.ci(b)

Solved – Is exact power analysis of the permutation (or randomization) test possible without i.i.d assumptions on the data

The short answer is yes. You will need to go about this with a simulation study. The only wrinkle to it is that you will need to make specific assumptions about the error distribution, the nature of the dependence, heterogeneity or whatever violation may occur when conducting a power analysis. This limits what you can say about the power of the tests in general. When you actually go about collecting data, the whole concept of a "data generating mechanism" goes out the window. But simulating a variety of scenarios is useful for explaining a test's possible limitations (or lack thereof).

It is a fault of classical statistics that assumptions are taught so dogmatically. Statistical tests may be well applied when assumptions have been violated. As an analyst, your responsibility is to report the findings from these tests, and discuss the possible limitations that may arise. As a statistician conducting a power analysis, your responsibility is to anticipate a variety of scenarios where assumptions are violated and make recommendations based on prior subject matter knowledge to recommend a test that's most general (not necessarily most powerful).

When you set up a simulation experiment to demonstrate statistical power for incorrectly applied tests, it is usually useful to report the absolute relative efficiency (ARE) for the "right" statistical test. For instance, if model misspecification is happening, or heteroscedasticity, or distributional violations occur, there is a correct test that may be applied based on the data generating mechanism that you have set up. An ARE of 1 shows the user that the "incorrect" test is just as good as the perfect one. Many statisticians and researchers would prefer a more "general" test that can be applied in many situations to a "maximum power" test that fails absolutely when assumptions are violated. This is the statistical notion of risk. For instance, a Cox proportional hazards model may have AREs as low as 0.3 to parametric survival models, but it's ability to accommodate a wide variety of baseline survival functions is what's preferred by researchers and statisticians.

Best Answer

Related Solutions

Solved – How do we create a confidence interval for the parameter of a permutation test

Solved – Is exact power analysis of the permutation (or randomization) test possible without i.i.d assumptions on the data

Related Question