Solved – How to do permutation test on model coefficients when including an interaction term

hypothesis testinginteractionpermutation-testregressionregression coefficients

Given the following model as an example:

$$Y=\beta_0+\beta_A\cdot A+\beta_B\cdot B+\beta_{AB}\cdot A \cdot B+\epsilon$$

In alternative notation:

$$Y\sim A + B + A: B$$

The main question:

When permuting entries of variable $A$ to test its coefficient ($\beta_A$) in a model, should an interaction term that includes it such as $B\cdot A$ be recomputed as well?

Secondary question:

And what about testing the $B\cdot A$ interaction term coefficient ($\beta_{AB}$)? Are its permutations computed regardless of the variables $A$ and $B$?

A bit of context:

I want to perform a test on the coefficients of a model (it's a canonical correlation analysis, but the question is applicable to any linear model including interactions).

I'm trying my hands with permutation tests. While it's fairly straightforward to test the canonical correlation itself, how to do the same with the variable scores, or coefficients, is a bit unclear to me when including an interaction term.

I've read How to test an interaction effect with a non-parametric test (e.g. a permutation test)?, but my question is much more practical.

Best Answer

As I'm just starting with permutation tests, I though a question was a good idea. Indeed, thanks to comments by @Glen_b and @user43849, I perceived many misunderstandings and inconsistencies of the theory from my part. For one, I was thinking about testing the magnitude of the coefficient instead of the effect, which is what actual interest.

So, as I'm learning, an actual answer to be criticized sounded just as good.


To answer this question and appoint a permutation strategy that complies with my requirements, I resorted to Anderson MJ, Legendre P. "An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model." Journal of statistical computation and simulation 62.3 (1999): 271-303.

There, the authors do empirical comparisons between four permutational strategies, in addition to normal theory $t$-statistic tests:

  1. Permutation of Raw Data (Manly, 1991, 1997)
  2. Permutation of Residuals under Reduced Model (Freedman & Lane, 1983)
  3. Permutation of Residuals under Reduced Model (Kennedy, 1995)
  4. Permutation of Residuals under Full Model (ter Braak, 1990, 1992)

Here I'll quote the description given to the strategy put forward by Manly. Given a model $Y=\mu+\beta_{1\cdot2}X+\beta_{2\cdot1}Z+\epsilon$:

  1. The Variable Y is regressed on X and Z together (using least squares) to obtain an estimate $b_{2\cdot 1}$ of $\beta_{2\cdot 1}$ and a value of the usual $t$-statistic, $t_\text{ref}$ for testing $\beta_{2\cdot 1}=0$ for the real data. We hereafter refer to this as the reference value of $t$
  2. The Y values are permuted randomly to obtain permuted values Y*.
  3. The Y* values are regressed on X and Z (unpermuted) together to obtain an estimate $b_{2\cdot 1}^*$ of $\beta_{2\cdot 1}$ and a value of $t^*$ for the permuted data.
  4. Steps 2-3 are repeated a large number of times, yielding a distribution of values of $t^*$ under permutation.
  5. The absolute value of the reference value $t_\text{ref}$ is placed in the distribution of absolute values of $t^*$ obtained under permutation (for a two-tailed $t$-test). The probability is calculated as the proportion of values in this distribution greater than or equal, in absolute value, to the absolute value of $t_\text{ref}$ (Hope, 1968)

So this strategy conserves the covariance of the independent variables X and Z. Other methods focus on the testing of partial coefficients in isolation, and these are discussed in the text. Also, possible drawbacks of the strategy of permutation of raw data are given both in the text and in the literature.

Related Question