Solved – What’s a good normality test for time-series / sleep data

anderson darling testkolmogorov-smirnov testnormality-assumptiontime series

I have data that is, at its most basic level, activity counts per minute of fruit flies. Flies are considered to be asleep when they have no activity for at least 5 minutes. The data essentially looks somewhat like a sinusoidal curve when plotted as activity vs. time or sleep vs. time. I have researched various normality tests and have so far been using Shapiro-Wilks (SW), Kolmogorov-Smirnov (KS), and the Anderson Darling (AD) test. I've ended up using the AD test mostly because its supposed to be a better form of the KS test (which I have seen used in the literature). If it matters, the biological reason for non-normality is that these flies are mutants and some of them tend to be grossly hyperactive. I hesitate to use a normality test just because it's commonly seen in the literature. So, are there better tests than the ones I've used so far? And are there specific reasons to use them with time-series / sleep data?

Best Answer

I don't see why a normality test would be any more or less applicable just because the data are from a time series. You seem to have a sense of the pitfalls associated with the common normality tests; perhaps you realize, for example, that they can be very sensitive to sample size at both extremes. More important is why you want to test for normality, though. Are you saying that the reason is to identify outliers? There's a potential circularity there. Suppose you find your data to be nonnormal but that, excluding certain outliers from the bell curve, you move closer to normality. This could be a slippery slope: you could continue excluding outlying cases until you obtain what the test shows to be a normal distribution, and then argue that "of course these excluded cases must be outliers." It would amount to something akin to cherry picking. For all we know, maybe the distribution would be better seen as some nonnormal one in which the putative outliers are just where they ought to be.

Alternatively, if you're looking to establish normality for the sake of meeting the normality assumptions of tests such as anova or regression, please note that these procedures can handle fairly nonnormal data without a problem as long as the residuals are fairly normal. And there it's a matter of individual taste whether to use as a criterion a normality test or a graphical method such as the histogram or the Q-Q plot. You'll find plenty of discussion about these topics on this site if you do a search.