Solved – Confidence Interval and P Value uncertainty for Permutation Test

confidence intervalp-valuepermutation-test

I'm learning randomization tests right now. There are two questions coming to my mind:

  1. Yes, it's easy and intuitive how p-value is computed with randomization test (which I think is the same as permutation test?). However, how could we also generate a 95% confidence interval as we do with normal parametric tests?

  2. As I'm reading a document from University of Washington on permutation tests, there is a sentence on page 13 that says:

    With 1000 permutations …., the uncertainty near p = 0.05 is about
    $\pm 1\% $.

    I wonder how we get this uncertainty.

Best Answer

However, how could we also generate a 95% confidence interval as we do with normal parametric tests?

Here's one way you could generate an interval from a resampling test, though it's not always appropriate to consider it a confidence interval$^\dagger$. For a specific example, take a test for a two-sample difference in means. Consider shifting the second sample by $\delta$ (which can be positive or negative). Then the set of $\delta$ values which would lead to non-rejection by the test at level $\alpha$ could be used as a nominally $1-\alpha$ confidence interval for the difference in means.

$\dagger$ Some authors (e.g. [1], p364 et seq, [2]) call an interval constructed this way (parameter values not rejected by the test) a consonance interval -- which is a better name than confidence interval for it (though many people simply ignore the difference; for example I believe Cox & Hinkley call these confidence intervals) because the approach doesn't necessarily give intervals that have the desired coverage (in many situations it's possible to see that it should); the name conveys something about what the interval does tell you (an interval of values consistent with the data).

Gelman includes discussion of why it can sometimes be problematic to universally consider them confidence intervals here.

It's not hard to explore the coverage under particular sets of assumptions (via simulation), though, and there's no lack of people calling bootstrap intervals "confidence intervals" (even when they are sometimes seen to have nothing like the claimed coverage).

More details on how to do it in the two sample difference-in-means case are discussed in [3], where they're called randomization confidence intervals and a claim is made there about when they're exact (which claim I haven't tried to evaluate).

With 1000 permutations ...., the uncertainty near p = 0.05 is about ±1%.

I wonder how we get this uncertainty?

The estimated p-value is a straight binomial proportion. So it has the same standard error as any other binomial proportion, $\sqrt{\frac{p(1-p)}{n}}$.

So if $p = 0.05$ and $n=1000$, the standard error of the observed proportion is about $0.0069$. A $90\%$ CI would be $\pm 1.13\%$ [Alternatively, $\pm 1\%$ is about $1.45$ standard errors each side, which would correspond to a confidence interval for the underlying p-value of a bit over $85\%$]

So at least in a rough sense you could talk about the uncertainty being "about 1%"

--

[1] Kempthorne and Folks (1971),
Probability, Statistics, and data analysis,
Iowa State University Press

[2] LaMotte L.R. and Volaufová J, (1999),
"Prediction Intervals via Consonance Intervals",
Journal of the Royal Statistical Society. Series D (The Statistician), Vol. 48, No. 3, pp. 419-424

[3] Ernst, M.D. (2004),
"Permutation Methods: A Basis for Exact Inference", Statistical Science, Vol. 19, No. 4, 676–685