I have data that is, at its most basic level, activity counts per minute of fruit flies. Flies are considered to be asleep when they have no activity for at least 5 minutes. The data essentially looks somewhat like a sinusoidal curve when plotted as activity vs. time or sleep vs. time. I have researched various normality tests and have so far been using Shapiro-Wilks (SW), Kolmogorov-Smirnov (KS), and the Anderson Darling (AD) test. I've ended up using the AD test mostly because its supposed to be a better form of the KS test (which I have seen used in the literature). If it matters, the biological reason for non-normality is that these flies are mutants and some of them tend to be grossly hyperactive. I hesitate to use a normality test just because it's commonly seen in the literature. So, are there better tests than the ones I've used so far? And are there specific reasons to use them with time-series / sleep data?
Solved – What’s a good normality test for time-series / sleep data
anderson darling testkolmogorov-smirnov testnormality-assumptiontime series
Related Solutions
There can be no single state-of-the-art for goodness of fit (for example no UMP test across general alternatives will exist, and really nothing even comes close -- even highly regarded omnibus tests have terrible power in some situations).
In general when selecting a test statistic you choose the kinds of deviation that it's most important to detect and use a test statistic that is good at that job. Some tests do very well at a wide variety of interesting alternatives, making them decent default choices, but that doesn't make them "state of the art".
The Anderson Darling is still very popular, and with good reason. The Cramer-von Mises test is much less used these days (to my surprise because it's usually better than the Kolmogorov-Smirnov, but simpler than the Anderson-Darling -- and often has better power than it on differences "in the middle" of the distribution)
All of these tests suffer from bias against some kinds of alternatives, and it's easy to find cases where the Anderson-Darling does much worse (terribly, really) than the other tests. (As I suggest, it's more 'horses for courses' than one test to rule them all). There's often little consideration given to this issue (what's best at picking up the deviations that matter the most to me?), unfortunately.
You may find some value in some of these posts:
2 Sample Kolmogorov-Smirnov vs. Anderson-Darling vs Cramer-von-Mises (about two-sample tests but many of the statements carry over
Motivation for Kolmogorov distance between distributions (more theoretical discussion but there are several important points about practical implications)
I don't think you'll be able to form a confidence interval for the cdf in the Cramer-von Mises and Anderson Darline statistics, because the criteria are based on all of the deviations rather than just the largest.
First a general comment: Note that the Anderson-Darling test is for completely specified distributions, while the Shapiro-Wilk is for normals with any mean and variance. However, as noted in D'Agostino & Stephens$^{[1]}$ the Anderson-Darling adapts in a very convenient way to the estimation case, akin to (but converges faster and is modified in a way that's simpler to deal with than) the Lilliefors test for the Kolmogorov-Smirnov case. Specifically, at the normal, by $n=5$, tables of the asymptotic value of $A^*=A^2\left(1+\frac{4}{n}-\frac{25}{n^2}\right)$ may be used (don't be testing goodness of fit for n<5).
I have read somewhere in the literature that the Shapiro–Wilk test is considered to be the best normality test because for a given significance level, α, the probability of rejecting the null hypothesis if it's false is higher than in the case of the other normality tests.
As a general statement this is false.
Which normality tests are "better" depends on which classes of alternatives you're interested in. One reason the Shapiro-Wilk is popular is that it tends to have very good power under a broad range of useful alternatives. It comes up in many studies of power, and usually performs very well, but it's not universally best.
It's quite easy to find alternatives under which it's less powerful.
For example, against light tailed alternatives it often has less power than the studentized range $u=\frac{\max(x)−\min(x)}{sd(x)}$ (compare them on a test of normality on uniform data, for example - at $n=30$, a test based on $u$ has power of about 63% compared to a bit over 38% for the Shapiro Wilk).
The Anderson-Darling (adjusted for parameter estimation) does better at the double exponential. Moment-skewness does better against some skew alternatives.
Could you please explain to me, using mathematical arguments if possible, how exactly it works compared to some of the other normality tests (say the Anderson–Darling test)?
I will explain in general terms (if you want more specific details the original papers and some of the later papers that discuss them would be your best bet):
Consider a simpler but closely related test, the Shapiro-Francia; it's effectively a function of the correlation between the order statistics and the expected order statistics under normality (and as such, a pretty direct measure of "how straight the line is" in the normal Q-Q plot). As I recall, the Shapiro-Wilk is more powerful because it also takes into account the covariances between the order statistics, producing a best linear estimator of $\sigma$ from the Q-Q plot, which is then scaled by $s$. When the distribution is far from normal, the ratio isn't close to 1.
By comparison the Anderson-Darling, like the Kolmogorov-Smirnov and the Cramér-von Mises, is based on the empirical CDF. Specifically, it's based on weighted deviations between ECDF and theoretical ECDF (the weighting-for-variance makes it more sensitive to deviations in the tail).
The test by Shapiro and Chen$^{[2]}$ (1995) (based on spacings between order statistics) often exhibits slightly more power than the Shapiro-Wilk (but not always); they often perform very similarly.
--
Use the Shapiro Wilk because it's often powerful, widely available and many people are familiar with it (removing the need to explain in detail what it is if you use it in a paper) -- just don't use it under the illusion that it's "the best normality test". There isn't one best normality test.
[1]: D’Agostino, R. B. and Stephens, M. A. (1986)
Goodness of Fit Techniques,
Marcel Dekker, New York.
[2]: Chen, L. and Shapiro, S. (1995)
"An Alternative test for normality based on normalized spacings."
Journal of Statistical Computation and Simulation 53, 269-287.
Best Answer
I don't see why a normality test would be any more or less applicable just because the data are from a time series. You seem to have a sense of the pitfalls associated with the common normality tests; perhaps you realize, for example, that they can be very sensitive to sample size at both extremes. More important is why you want to test for normality, though. Are you saying that the reason is to identify outliers? There's a potential circularity there. Suppose you find your data to be nonnormal but that, excluding certain outliers from the bell curve, you move closer to normality. This could be a slippery slope: you could continue excluding outlying cases until you obtain what the test shows to be a normal distribution, and then argue that "of course these excluded cases must be outliers." It would amount to something akin to cherry picking. For all we know, maybe the distribution would be better seen as some nonnormal one in which the putative outliers are just where they ought to be.
Alternatively, if you're looking to establish normality for the sake of meeting the normality assumptions of tests such as anova or regression, please note that these procedures can handle fairly nonnormal data without a problem as long as the residuals are fairly normal. And there it's a matter of individual taste whether to use as a criterion a normality test or a graphical method such as the histogram or the Q-Q plot. You'll find plenty of discussion about these topics on this site if you do a search.