Why Parametric Tests Are More Powerful Than Non-Parametric Tests – An Exploration

nonparametricparametricstatistical-power

I'd like to understand why parametric tests are more powerful than their non-parametric alternatives. Is the word choice of "power" the same as statistical power? As I understand it, power just relates to the likelihood of getting a p-value that will correctly reject a false/incorrect null hypothesis, but I don't understand how this relates to statistical tests based on the normal distribution specifically.

Best Answer

This answer is mostly going to reject the premises in the question. I'd have made it a comment calling for a rephrasing of the question so as not to rely on those premises, but it's much too long, so I guess it's an answer.

Why are parametric tests more powerful than non-parametric tests?

As a general statement, the title premise is false. Parametric tests are not in general more powerful than nonparametric tests. Some books make such general claims but it makes no sense unless we are very specific about which parametric tests and which nonparametric tests under which parametric assumptions, and we find that in fact it's typically only true if we specifically choose the circumstances under which a parametric test has the highest power relative to any other test -- and even then, there may often be nonparametric tests that have equivalent power in very large samples (with small effect sizes).

Is the word choice of "power" the same as statistical power?

Yes. However, to compute power we need to specify a precise set of assumptions and a specific alternative.

I don't understand how this relates to statistical tests based on the normal distribution specifically.

Nothing about the terms "parametric" nor "nonparametric" relate specifically to the normal distribution.

see the opening paragraph here:

https://en.wikipedia.org/wiki/Parametric_statistics

Parametric statistics is a branch of statistics which assumes that sample data comes from a population that can be adequately modeled by a probability distribution that has a fixed set of parameters.$^{[1]}$ Conversely a non-parametric model does not assume an explicit (finite-parametric) mathematical form for the distribution when modeling the data. However, it may make some assumptions about that distribution, such as continuity or symmetry.

Some textbooks (particularly ones written for students in some application areas, typically by academics in those areas) get this definition quite wrong. Beware; in my experience, if this term is misused, much else will tend to be wrong as well.


Can we make a true statement that says something like what's in your question? Yes, but it requires heavy qualification.

If we use the uniformly most powerful test (should such a test exist) under some specific distributional assumption, and that distributional assumption is exactly correct, and all the other assumptions hold, then a nonparametric test will not exceed that power (otherwise the parametric test would not have been uniformly most powerful after all). However - in spite of stacking the deck in favour of the parametric test like that - in many cases you can find a nonparametric test that has the same large sample power in exactly that stacked-deck situation -- it just won't be one of the common rank-based tests you're likely to have seen before.

What we're doing is in the parametric case choosing a test statistic which has all the information in the statistic about the difference from the null, given the distributional assumption and the specific form of alternative. If you optimize power under some set of assumptions, obviously you can't beat it under those assumptions, and that's the situation we're in.

Conover's book Practical Nonparametric Statistics has a section discussing tests with an asymptotic relative efficiency (ARE) of 1, relative to tests that assume normality. This ARE is while under that normal assumption. He focuses there on normal scores tests (score-based rank-tests which I would tend to avoid in most typical situations for other reasons), but it does help to illustrate that the claimed advantages for parametric tests may not always be so clear. It's the next section (on permutation tests, under "Fishers Method of Randomization") where I tend to focus. In any case, such stacking of the deck in favour of the parametric assumption still doesn't universally beat nonparametric tests.

Of course, in a real-world testing situation such neatly 'stacked decks' don't occur. The parametric model is not a fact about our real data, but a model -- a convenient approximation. As George Box put it, All models are wrong.

In this case the questions we would want to ask are (a) "is there a nonparametric test that's essentially as powerful as this parametric test in the situation where the parametric assumption holds?" (to which the answer is often 'yes') and (b) "how far do we need to modify the exact parametric assumption before it is less powerful than some suitable nonparametric test?" (which is often "hardly at all"). In that case, if you don't know which of the two sets of circumstances you're in, why would you prefer the parametric test?


Let me address a common test. Consider the two-sample equal-variance t-test, which is uniformly most powerful for a one-sided test of a shift in mean when the population is exactly normal.

(a) Is it more powerful than every nonparametric test?

Well, no, in the sense that there are nonparametric tests whose asymptotic relative efficiency is 1 (that is, if you look at the ratio of sample sizes required to achieve the same power at a given significance level, that ratio goes to 1 in large samples); specifically there are permutation tests (e.g. based on the same statistic) with this property. The asymptotic power is also a good guide to the relative power at typical sample sizes (if you make sure the tests are being performed at the same actual significance level).

(b) Do you need to modify the situation much before some non-parametric test has better power?

As I suggested above, in this location-test under normality case, hardly at all. Even if we restrict consideration to just the most commonly used rank tests (which is limiting our potential power), you don't need to make the distribution very much more heavy-tailed than the normal before the Wilcoxon-Mann-Whitney test typically has better power. If we're allowed to choose something with better power at the normal (though the Wilcoxon-Mann-Whitney has excellent performance there), it can kick in even quicker.

It can be extremely hard to tell whether you're sampling from a population with a very slightly heavier tail than the one you assumed, so having slightly better power (at best) in a situation you cannot be confident holds may be an extremely dubious advantage.

In any case you should not try to tell which situation you're in by looking at the sample you're conducting the test on (at least not if it will affect your choice of test), since that data-based test choice will impact the properties of your subsequently chosen test.

Related Question