Parametric vs Nonparametric Statistics: Why Choose Parametric Methods

estimationhypothesis testingmathematical-statisticsnonparametricregression

Can someone explain to me why would anyone choose a parametric over a nonparametric statistical method for hypothesis testing or regression analysis?

In my mind, it's like going for rafting and choosing a non-water resistant watch, because you may not get it wet. Why not use the tool that works on every occasion?

Best Answer

Rarely if ever a parametric test and a non-parametric test actually have the same null. The parametric $t$-test is testing the mean of the distribution, assuming the first two moments exist. The Wilcoxon rank sum test does not assume any moments, and tests equality of distributions instead. Its implied parameter is a weird functional of distributions, the probability that the observation from one sample is lower than the observation from the other. You can sort of talk about comparisons between the two tests under the completely specified null of identical distributions... but you have to recognize that the two tests are testing different hypotheses.

The information that parametric tests bring in along with their assumption helps improving the power of the tests. Of course that information better be right, but there are few if any domains of human knowledge these days where such preliminary information does not exist. An interesting exception that explicitly says "I don't want to assume anything" is the courtroom where non-parametric methods continue to be widely popular -- and it makes perfect sense for the application. There's probably a good reason, pun intended, that Phillip Good authored good books on both non-parametric statistics and courtroom statistics.

There are also testing situations where you don't have access to the microdata necessary for the nonparametric test. Suppose you were asked to compare two groups of people to gauge whether one is more obese than the other. In an ideal world, you will have height and weight measurements for everybody, and you could form a permutation test stratifying by height. In a less than ideal (i.e., real) world, you may only have the mean height and mean weight in each group (or may be some ranges or variances of these characteristics on top of the sample means). Your best bet is then to compute the mean BMI for each group and compare them if you only have the means; or assume a bivariate normal for height and weight if you have means and variances (you'd probably have to take a correlation from some external data if it did not come with your samples), form some sort of regression lines of weight on height within each group, and check whether one line is above the other.