Solved – D’Agostino-Pearson vs. Shapiro-Wilk for normality

goodness of fithypothesis testingnormality-assumptionprobability

In the field I work in, there is a large amount of impetus to use Shapiro-Wilk testing as the default normality test (possibly due to NIST and some pubmed papers). I understand that one weakness of SW testing is for tie values, but am not sure of when specifically I should consider switching to the D'Agostino-Pearson (somewhat less favored by some who hold sway for some reason). I am trying to get help in determining what is a most generally valid method for switching between the two without abusing the math.

  1. Is there a clear cut reasoning for a threshold (i.e. one repeat, a triplet value, 2+ repeats of different values, a % of the sample that has at least one repeat) of when using the Shapiro Wilk test is obviously inappropriate?

  2. Much more nuanced, and I'm not even sure if this is an appropriately narrow question, but does anyone have a rationale for how similar two values should be in order to be considered ties? I work with a lot of biological samples, and due to human variance, methodological noise, etc. it can be fairly rare that two samples give identical readings, though they may share commonality to several decimal places. This may sound very ignorant, but would taking a 5% difference between two samples (or adjusting for estimated experimental error) seem reasonable enough not to be risible to someone very statistically-oriented?

Thank you very much for your time and guidance.

Best Answer

The ultimate reason why the Shapiro Wilk is popular has, I think, less to do with Pubmed or NIST but rather with its excellent power in a wide variety of situations of interest (which would in turn lead to wider implementation and hence, popularity); it generally comes out toward the top against a wide variety of non-normal distributions in power comparisons with other possible choices. I wouldn't claim it's the best possible omnibus test of normality, but it's a very solid choice.

If you have ties beyond that due to the ordinary rounding real numbers to some reasonable number of figures, you can reject normality immediately (say, if your data were counts!).

it can be fairly rare that two samples give identical readings, though they may share commonality to several decimal places.

the occasional such tie -- or a small fraction of ties -- should present no problem for the Shapiro-Wilk.

The Shapiro Wilk is impacted by ties, but a few ties shouldn't be a big issue.

Royston, 1989[1] says:

The Shapiro-Wilk test [...] should not be used if the grouping interval exceeds 0.1 standard deviation units.

That's pretty big. With a normal distribution, a grouping interval of 0.1 s.d. would only produce about 35 unique values out of 100. This is an example of Royston's edge-case at n=100:

enter image description here

One of the values is repeated ten times. That's what he's saying is okay (just).

You need either really tiny sds or pretty heavy rounding to do worse than this.

The same paper suggests a modification for ties in that situation.

when specifically I should consider switching to the D'Agostino-Pearson (somewhat less favored by some who hold sway for some reason)

If you mean the test based on the skewness and kurtosis, then the reason is obvious enough. It simply doesn't perform quite as well overall. If there are differences in skewness or kurtosis, it's an excellent test, often display quite good power, but not every non-normal distribution differs substantively in skewness or kurtosis. Indeed it's a trivial matter to find distinctly non-normal distributions with the same skewness and kurtosis as the normal.

There's an example here which has skewness and kurtosis the same as for the normal, but you can see it's non-normal at a glance! (You may find that post useful more broadly.)

The D'Agostino $K^2$ test has very poor power against those, but Shapiro-Wilk has no trouble with them.

does anyone have a rationale for how similar two values should be in order to be considered ties?

For the statistical issues relevant to ties (as here), usually they're tied if they're exactly equal to as many figures as you have. Of course if you have given many more figures than are meaningful, that may be a different issue.

[1]: Royston, J.P. (1989),
"Correcting the Shapiro-Wilk W for ties,"
Journal of Statistical Computation and Simulation, Volume 31, Issue 4