Shapiro Wilk Test – Is the Shapiro Wilk Test W an Effect Size?

effect-sizehypothesis testingnormality-assumption

I want to avoid misusing normality tests where a large enough sample size will highlight any slight non-normality. I want to be able to say that a distribution is "normal enough".

When the population is non-normal the p-value for the Shapiro-Wilk test tends to 0 as the sample size increases. The p-value isn't helpful in deciding if a distribution is "normal enough".

I think a solution would be to measure the effect size of the non-normality and reject anything which is more non-normal than a threshold.

The Shapiro Wilk test produces a test statistic $W$. Is this a way to measure the effect size of the non-normality?

I tested this in R by doing a shapiro wilk test on samples drawn from a uniform distribution. The number of samples ranged from 10 to 5000, the results are plotted below. The value of W does converge to a constant, it doesn't tend towards $1$. I'm unsure if $W$ is biased for small samples, it seems to be low for small sample sizes. If $W$ is a biased estimate of effect size that could be a problem if I want to accept anything under $W=0.1$ as "normal enough".

My two questions are:

Is $W$ a measure of effect size of non-normality?
Is $W$ biased for small sample sizes?

Best Answer

As you know, $W$ is a test statistic. In most cases (all consistent tests), a test statistic is not a suitable effect estimator as the statistic reflects the sample size whereas the effect estimator shall be independent of it. Just think of an asymptotic test to test zero mean under the central limit theorem: The approximate distribution is the same for all $n$, so the test statistic contains even all the information about the sample size. That makes the test statistic unsuitable as effect estimater.

For $W$, it is similar (although the approximate distribution depends on the sample size as well). The lower bound for $W$ is $\frac{a_1^2n}{(n-1)}$, where $a_1$ depends on is the expectation for the smallest order statistic.

So no, it is no suitable effect estimator at all.

In fact, I think you are not yet sure what you are looking for as the term "effect" is a bit more difficult than in the usual parametric world of one-dimensional parameters. Here, the raw effect of a.s. not being normally distributed is infinite dimensional: Each measurable subset of $\mathbb{R}$ can have a different probability from the normal distribution model. For a one-dimensional effect, you need to weight it somehow and be aware of the consequences of various weights to your intended application. This way you would decide if e.g. a certain bimodal distribution with Gaussian tails is more normal than a certain unimodal distribution with heavy tails. In fact trading the tail behaviour against the non-tail behaviour might be the most relevant question to invent a suitable effect.

Then, if will be much easier to find an estimator for this particular effect.

Related Solutions

Shapiro-Wilk Test – Interpretation of Shapiro-Wilk Test Results

No - you cannot say "the sample has a normal distribution" or "the sample comes from a population which has a normal distribution", but only "you cannot reject the hypothesis that the sample comes from a population which has a normal distribution".

In fact the sample does not have a normal distribution (see the qqplot below), but you would not expect it to as it is only a sample. The question as to the distribution of the underlying population remains open.

qqnorm( c(0.269, 0.357, 0.2, 0.221, 0.275, 
          0.277, 0.253, 0.127, 0.246) )

qqplot

Solved – Normality identifier in Shapiro-Wilk test

You're asking for something like an effect size (A "how big?" type question).

P-values don't measure that; at a given value of W, the p-value tends to go down as n goes up.

The Shapiro-Wilk statistic, W, is in some sense a measure of "closeness to what you'd expect to see with normality", akin to a squared correlation (if I recall correctly, the closely related Shapiro-Francia test is actually a squared correlation between the data and the normal scores, while the Shapiro Wilk tends to be slightly larger; I seem to recall that it takes into account correlations between order statistics).

Specifically values closer to 1 indicate "closer to what you'd expect if the distribution the data were drawn from is normal".

However, keep in mind it's a random variable; samples can exhibit random fluctuations that don't represent their populations, and summary statistics will follow suit.

It's not immediately clear that it necessarily makes sense to compare Shapiro-Wilk statistics across data-sets in order to declare one set "more normal" than another; even less so with very different variables and different sample sizes.

Further, choosing the one closest to 1 among a collection of samples may actually be choosing something other than values randomly selected from a normal distribution, for a variety of reasons. For example, goodness of fit tests generally tend to be biased tests; what makes their criterion "closest" isn't necessarily the thing the test is actually designed to pick up. (I don't know what sorts of small-sample biases the Shapiro-Wilk specifically may have, however.)

Finally, I don't see any useful point to such an exercise. What possible value can there be in such a procedure?

Best Answer

Related Solutions

Shapiro-Wilk Test – Interpretation of Shapiro-Wilk Test Results

Solved – Normality identifier in Shapiro-Wilk test

Related Question