When using the Shapiro-Wilk test, should I look at the p-values or the W values in order to find out the "most" normal value among my different samples (e.g. iq, age, weight — if I'm running Shapiro-Wilk test on iq, age, and weight and want to find out which is the most normal value, should I look at the P-value or the W? And what number should it be close to?)
Solved – Normality identifier in Shapiro-Wilk test
normality-assumption
Related Solutions
The ultimate reason why the Shapiro Wilk is popular has, I think, less to do with Pubmed or NIST but rather with its excellent power in a wide variety of situations of interest (which would in turn lead to wider implementation and hence, popularity); it generally comes out toward the top against a wide variety of non-normal distributions in power comparisons with other possible choices. I wouldn't claim it's the best possible omnibus test of normality, but it's a very solid choice.
If you have ties beyond that due to the ordinary rounding real numbers to some reasonable number of figures, you can reject normality immediately (say, if your data were counts!).
it can be fairly rare that two samples give identical readings, though they may share commonality to several decimal places.
the occasional such tie -- or a small fraction of ties -- should present no problem for the Shapiro-Wilk.
The Shapiro Wilk is impacted by ties, but a few ties shouldn't be a big issue.
Royston, 1989[1] says:
The Shapiro-Wilk test [...] should not be used if the grouping interval exceeds 0.1 standard deviation units.
That's pretty big. With a normal distribution, a grouping interval of 0.1 s.d. would only produce about 35 unique values out of 100. This is an example of Royston's edge-case at n=100:
One of the values is repeated ten times. That's what he's saying is okay (just).
You need either really tiny sds or pretty heavy rounding to do worse than this.
The same paper suggests a modification for ties in that situation.
when specifically I should consider switching to the D'Agostino-Pearson (somewhat less favored by some who hold sway for some reason)
If you mean the test based on the skewness and kurtosis, then the reason is obvious enough. It simply doesn't perform quite as well overall. If there are differences in skewness or kurtosis, it's an excellent test, often display quite good power, but not every non-normal distribution differs substantively in skewness or kurtosis. Indeed it's a trivial matter to find distinctly non-normal distributions with the same skewness and kurtosis as the normal.
There's an example here which has skewness and kurtosis the same as for the normal, but you can see it's non-normal at a glance! (You may find that post useful more broadly.)
The D'Agostino $K^2$ test has very poor power against those, but Shapiro-Wilk has no trouble with them.
does anyone have a rationale for how similar two values should be in order to be considered ties?
For the statistical issues relevant to ties (as here), usually they're tied if they're exactly equal to as many figures as you have. Of course if you have given many more figures than are meaningful, that may be a different issue.
[1]: Royston, J.P. (1989),
"Correcting the Shapiro-Wilk W for ties,"
Journal of Statistical Computation and Simulation, Volume 31, Issue 4
What matters for ANOVA is the normality of the residuals rather than the raw normality. The Shapiro-Wilk test is only one of the possible ways of checking normality, others including boxplots, plot(resid(model))
, and z-scores of skewness and kurtosis, stat.desc(model, norm=T)
(with the pastecs
package). Never rely on Shapiro alone. In fact, the z-scores are possibly the go-to scores, as they are robust to sample size, and don't require you being terribly experienced with instances of non-normality. The key figures for the z-scores are 'skew.2SE' and 'kurt.2SE', and the scores are expected to lie below 0.96 if the distribution is normal.
Now, if those residuals are non-normal, then consider solutions. Besides non-parametric alternatives to ANOVA, you might try replacing your dependent variable with its transformation, e.g., log()
, and then checking the residuals again.
Aside from that, if you want to check the normality of the variables themselves, do it per condition. Personally, I would enter the data in long format, with condition
as one column, and then use subsetting, i.e., dataset[dataset$condition=='0',]
, dataset[dataset$condition=='1',]
...
For a more advanced discussion of normality (along the lines of @Glen_b's comments), see this question.
Best Answer
You're asking for something like an effect size (A "how big?" type question).
P-values don't measure that; at a given value of W, the p-value tends to go down as n goes up.
The Shapiro-Wilk statistic, W, is in some sense a measure of "closeness to what you'd expect to see with normality", akin to a squared correlation (if I recall correctly, the closely related Shapiro-Francia test is actually a squared correlation between the data and the normal scores, while the Shapiro Wilk tends to be slightly larger; I seem to recall that it takes into account correlations between order statistics).
Specifically values closer to 1 indicate "closer to what you'd expect if the distribution the data were drawn from is normal".
However, keep in mind it's a random variable; samples can exhibit random fluctuations that don't represent their populations, and summary statistics will follow suit.
It's not immediately clear that it necessarily makes sense to compare Shapiro-Wilk statistics across data-sets in order to declare one set "more normal" than another; even less so with very different variables and different sample sizes.
Further, choosing the one closest to 1 among a collection of samples may actually be choosing something other than values randomly selected from a normal distribution, for a variety of reasons. For example, goodness of fit tests generally tend to be biased tests; what makes their criterion "closest" isn't necessarily the thing the test is actually designed to pick up. (I don't know what sorts of small-sample biases the Shapiro-Wilk specifically may have, however.)
Finally, I don't see any useful point to such an exercise. What possible value can there be in such a procedure?