Solved – Permutation test for model comparison

hypothesis testingmodel comparisonpermutation-test

I have two nested, non-linear models for the same data, and I want to test whether the more complex model explains significantly more variance. Due to a necessary smoothing step, my data aren't independent, so I can't use standard tests for this (e.g. an F-test or likelihood-ratio test). I naturally thought of using a permutation test because I can permute the data before the smoothing step. This way I would introduce the same dependencies in the permuted data that also exist in the observed data, s.t. the simulated null distribution is fair. However, I can't quite come up with the right way to do this.

I can think of a way to test whether either model explains significant variance. In this case, my algorithm would be:

  1. Fit the model to the smoothed observed data, and calculate $R^2$
  2. For 10,000 iterations, repeat steps 3-5:
  3. Randomly permute the observed data (i.e. randomly shuffle predictor and response values s.t. their true relationship is destroyed)
  4. Apply smoothing to this permuted data to introduce the same dependencies that the observed $R^2$-value suffers from
  5. Fit the model to the smoothed permuted data and calculate $R^2$
  6. Compare the observed $R^2$-value to the thus-constructed null distribution of $R^2$-values for this data & model

The point being that in this case it's easy to construct the null distribution, because the null hypothesis holds that there is no relationship between the predictors in the model and the outcome variable, and it is obvious how we can permute the data to emulate this. (Formally, I guess the null hypothesis is that the values of the predictors are exchangeable w.r.t. the values of the outcome variable.)

However, what I want to do is estimate a null distribution for the increase in $R^2$ from one model to the next. The null hypothesis here is that the added parameter in the more complex model is meaningless, i.e. that the two models are exchangeable. However, it seems to me that this is an exchangeablility hypothesis on the models, rather than on (some aspect of) the data, so I just don't see what I would permute in a permutation test in order to simulate this. Can anyone help me out? Am I missing something, or is this just not possible and perhaps I can only do a bootstrap here?

Update: in the original version of this question, I failed to specify that my models were non-linear. That is, they don't have the form $Y=X\beta+\epsilon$ (where $X$ is a matrix of predictors, $\beta$ is a vector of linear coefficients and $\epsilon$ is random noise). Instead, they have the more general form $Y=f(X; \theta)+\epsilon$, where $\theta$ is a vector of model parameters. The simpler model can be obtained by fixing the value of one of the parameters of the more complex/general model.

Best Answer

The typical permutation test, as you state, has a null hypothesis saying that the observations are exchangeable. If you have nested models $H_A: Y_i = \beta X_i + \epsilon_i$ and $H_0: Y_i = \epsilon_i$, with $\epsilon_i$ iid and zero-mean, then the exchangeability is a consequence of the form of the smaller model.

If you have $H_A: Y_i = \beta X_i + \gamma Z_i + \epsilon_i$ and $H_0: Y_i = \beta X_i + \epsilon_i$, then the observations $Y_i$ are not exchangeable even under the small model. Instead, you could permute the null-model residuals $Y_i - \hat \beta_0 X_i $, where $\hat \beta_0$ is an estimate of $\beta$ under the null hypothesis $\gamma=0$. If $\hat \beta_0$ is the usual least-squares estimate $(X^TX)^{-1}X^TY$, then under the null model, you're permuting $(I - X(X^TX)^{-1}X^T)\epsilon$, whose entries are not really exchangeable. For example, if $X$ is $[1, 0, 0, 0]$, the first residual is deterministically zero and the rest aren't. I recommend the bootstrap strategy from glen_b's answer, linked in comment above.

Edit: what to do with nonlinear models

As pointed out in comments, you can fit linear model to the residuals of another linear model and achieve the same predictions as you would via simultaneous estimation. This makes it obvious how to proceed (if you have nested linear models and you don't care about the lack of exchangeability of the null-model residuals): fit the null model, permute the residuals, then fit the remaining variables.

You can also do this with a nonlinear model:

  • Fit the smaller model to data $y$, producing predictions $\hat y_0$ and predicted residuals $r_0 = y - \hat y_0$.
  • For i in 1 to 10,000:
    • Fit the full model to $\pi_i(r_0) + \hat y_0$ and record the measure of fit ($\pi_i$ is a random permutation)
  • Fit the full model to the actual data, record the measure of fit, and compare it to the null distribution as generated in step 2.

To implement this, you only need each model to generate predictions. You don't need them to be linear. But again, don't do this, because those residuals are not actually exchangeable, even under the null model.

Related Question