The next step is to compare your value to a critical value for the test-statistic. From the same Wikipedia page:
If $A^{*2}$ exceeds a given critical value, then the hypothesis of normality is rejected with some significance level.
Meaning, your null hypothesis is that the data are generated from a normal distribution, and an $A^{*2}$ exceeding the critical value implies non-normality at that level of significance. But, the $A^{*2}$ statistic for a normal distribution is not itself normally distributed, per this resource:
The Anderson-Darling test makes use of the specific distribution in calculating critical values. This has the advantage of allowing a more sensitive test and the disadvantage that critical values must be calculated for each distribution.
The same source points to books and papers for these critical values. Perhaps you might be able to find CDFs for each $A^2$ statistic, and implement the p-value.
It is my understanding that the AD-test does not suffer from the same issue outlined above as it calculates critical values for each distribution being tested against as an inherent part of the test - is this correct?
No, it's a test for a fully specified distribution just like the Kolmogorov-Smirnov. When you estimate parameters but use the Kolmogorov-Smirnov statistic with different tables to account for that, it's properly called the Lilliefors' test; this test is discussed in numerous posts on site.
However, in many cases you can adjust the Anderson-Darling test statistic under the estimation of parameters. (Failing that, your approach of simulation to either get new critical values or p-values for the specific case at hand can work quite well.)
For example, in the case of estimating mean and variance and testing for normality, if the usual statistic $A^2$ is replaced with $A^{*2}=(1+4/n-25/n^2)A^2$ then the usual tables may be used to reasonable accuracy even at quite small sample sizes.
Alternatively in the case of normality, there's a different adjusted statistic $A^{*2}=(1+0.75/n+2.25/n^2)A^2$ (with its own table). To my recollection, details are given in D'Agostino and Stephens (1986) [1], but this result (and more) can be seen in this technical report Stephens (1979); it also has an adjusted statistic for the exponential case (and several other distributions, but not the gamma).
Note that the above-linked report gives an adjustment for the exponential (again with its own table). This page gives one for the Weibull and Gumbel cases.
Is this done an individual distribution basis? For example, will $Γ(α=3,β=10)$ produce a different set of critical values than $Γ(α=2,β=15)$?
Certainly in the case of the beta parameter, you only need to adjust for the fact of estimation itself -- the value of the parameter will make no difference (this is because -- depending on the parameterization you intended -- $\beta$ is either a scale parameter or the inverse of a scale parameter). It's not clear that the alpha parameter always has the same property, but I can be reasonably confident that except at small values of $\alpha$ (i.e. except down near 1 and below 1) it won't matter much. If your gamma has a peak that's well to the right of 0 (i.e. $\alpha>>1$), the cube root is an approximately normalizing transform, so different $\alpha$ shouldn't impact the distribution much -- most of the effect will be caused by estimation of alpha rather than the value (this will be the case more generally -- if there's a monotonic transformation which doesn't depend on the parameters that produces a location-scale family, the parameter values themselves won't matter).
So the fact that there's a transformation to almost-normality for any reasonably large $\alpha$ suggests that the adjustment used for the normal ($A^{*2}=(1+4/n-25/n^2)A^2$) may also work fairly well for gammas -- at least those with non-small shape parameter.
For cases where the shape parameter is -- or might be -- small (somewhere in the region of 1, or smaller), you may need to consider dealing with it individually, but I don't know for sure that it's necessary (the same adjustment may also work okay even there, but you'd need to check).
are there any cases when the KS-test is preferred over the AD-test?
If by "is preferred" you mean "has greater power", then certainly. Where the Anderson-Darling tends to perform better at picking up differences in the tail (especially where the tail may be heavier than supposed), the Kolmogorov-Smirnov tends to have better power at differences "in the middle", and where tails are lighter than supposed. [In some of these situations -- such as testing a uniform against a beta with parameters somewhat larger than 1 -- both of them perform badly and you probably would prefer something else over either, but the Kolmogorov-Smirnov can be considerably less terrible than the Anderson-Darling.]
Note that I wouldn't really call the Anderson-Darling a "modified form of the Kolmogorov-Smirnov". I'd say it's a modified (specifically, weighted) form of the Cramer-von Mises test. They're all tests based on the ECDF, but the Kolmogorov-Smirnov looks at the maximum distance while the Cramer-von Mises looks at something related to a sum-of-squared distances, which looks at things quite differently (and has generally better power against the more interesting alternatives). The Anderson-Darling then adjusts for the fact that the variance of the value of the ECDF is smaller near $0$ or $1$ than it is near $\frac12$, to re-weight those squared deviations for the relative precision (which is why it does better at finding deviations in the tail, especially ones that would tend to be associated with being "further" into the tail of the specified distribution).
[1] D'Agostino, R.B.; Stephens, M.A. (1986),
Goodness-of-Fit Techniques,
New York: Marcel Dekker.
Best Answer
For a fully specified distribution, the Anderson-Darling - as with the Kolmogorov-Smirnov, the Cramer-von Mises, the Kuiper test and many other ecdf-based tests - is distribution-free.
So you don't need tables for the 'standard t' such as that represented by the cdf function
pt
. All you need do is applypt
to your data (data
$^\dagger$) and test that for uniformity ... which is effectively what these tests all do, and that's howgoftest::ad.test
works -- it uses fully specified distributions.The asymptotic distribution of the Anderson-Darling test statistic for completely specified distributions was worked out by Anderson and Darling (1952, 1954) -- the 1952 paper gives the theory of computation of the asymptotic distribution (of a large class of tests of the Cramer-von Mises type, including specific discussion of what would become known as the Anderson-Darling test) and the 1954 paper gives asymptotic 10%, 5% and 1% critical values for the Anderson-Darling statistic.
In that paper they say that convergence to the asymptotic distribution is rapid and suggest it should be okay by $n=40$. Stephens (1974) suggests using it for $n\ge 5$.
Peter Lewis (1961) did tabulations of the distribution for $n\le 8$.
A little testing of
goftest::ad.test
suggests that the code isn't using the asymptotic distribution down at n=10, however (e.g. simulation at n=10 shows that the 5% CV there is around 2.512, which is larger than the asymptotic value). So if all else fails, let's read the help.The help for
ad.test
refers to Marsaglia and Marsaglia (2004). They use simulation for $n=2^k$ for $k=3,4,5,6,7$ to identify a simple 3-piece transformation of the small-sample test statistic (scaled by 1/n) so that the asymptotic distribution can be used for small $n$. So it seems their code takes the statistic, scales it a little bit according to their piecewise transformation, and then compares that to the asymptotic distribution.Anderson, T. W.; Darling, D. A. (1952).
"Asymptotic theory of certain "goodness-of-fit" criteria based on stochastic processes"
Annals of Mathematical Statistics 23: 193–212.
Anderson, T.W. and Darling, D.A. (1954).
"A Test of Goodness-of-Fit",
Journal of the American Statistical Association 49: 765–769.
Stephens M.A. (1974)
"EDF Statistics for Goodness of Fit and Some Comparisons,"
Journal of the American Statistical Association, 69:347 730-737
P.A. Lewis, (1961),
"Distribution of the Anderson-Darling Statistic,"
Ann. Math. Stat., 32 1118-1124.
Marsaglia, G. and Marsaglia, J. (2004)
"Evaluating the Anderson-Darling Distribution."
Journal of Statistical Software, 9 (2), 1-5. February.
http://www.jstatsoft.org/v09/i02
$\dagger$ About which: