The same considerations apply as to the distribution of the Kolmogorov–Smirnov test statistic discussed here. The Anderson–Darling test statistic (for a given sample size) has a distribution that (1) doesn't depend on the null-hypothesis distribution when all parameters are known, & (2) depends only on the functional form of the null-hypothesis distribution when location & scale parameters are estimated. I don't know of an R implementation of the A–D test specifically for the exponential distribution with estimated rate parameter, but you could quickly make a function to calculate the test statistic by adapting the `ad.test`

function from the `nortest`

package: change the distribution function from the best-fit normal, `pnorm((x - mean(x))/sd(x))`

, to the best-fit exponential,`pexp(x/mean(x))`

. Then get critical values for any desired significance level & sample size by simulation.

As to the "best" test, note that different tests are more powerful against different kinds of departure from the null-hypothesis distribution. If you have a quite specific alternative in mind, e.g. a Weibull distribution with shape parameter greater than one, a likelihood ratio test will be more powerful than a general-purpose goodness-of-fit test. For more vaguely specified alternatives it might be helpful to compare the power of various tests against a rogues gallery, following the approach of Stephens (1974), "EDF statistics for goodness of fit and some comparisons", *JASA*, **69**, 347.

It depends on what you intend by "better fit".

Goodness of fit statistics measure deviation from perfect fit in some manner (we'll take it as given - as you assume in your question - that our statistic is organized such that smaller values mean closer fit to the distributional model in the null). Such statistics are sensitive to some kinds of deviation, and may be insensitive to other kinds of deviation.

If the kinds of deviation from the hypothesized model that the test statistic is good at picking up are the ones important for you to pick up (for whatever purpose you're testing for), then a smaller value of the statistic does indeed mean the fit is better (in the sense of 'better fit' defined by whatever you're trying to do).

If on the other hand there can be important deviations from the hypothesized model that the test statistic is not sensitive to then a smaller value doesn't necessarily mean better fit for your purposes.

[Note that since people's purposes may differ, what is a better fit for person 1 may not be a better fit for person 2.]

This makes it important to use a statistic that *does* pick up the sort of deviations that are important for you to pick up -- not just pick one at random. A well chosen statistic will then represent better or worse fit in a specific sense that's directly relevant to you.

I'm trying to figure out which distribution is best for a data set of pipe diameters. Since this amounts to the distribution of manufacturing deviations from spec (on top of measurement errors, of course), I believe the data might be normally distributed. There are also historical data which suggest that is usually the case with this particular data set. Do you think that KS is OK for this particular application?

No, I don't, for three reasons:

Nothing about your hypothesis specifies the parameter values for the normal distribution, and the Kolmogorov-Smirnov test is for a completely specified distribution.

While it's possible to use the same statistic for a general goodness of fit test for normality if you calculate new "tables" (a new distribution of the test statistic under H0), at which point you get Lilliefors' test, it's typically less powerful than the Shapiro-Wilk test, so you'd need good reason (such as a specific alternative in mind that it is better at) to choose the Lilliefors.

We actually can tell the underlying random variable *isnt* truly normal right from the start (pipe diameters are positive; manufacturing errors are also bounded, on at least one side). You must already know that before you even collect data. So the question "*are these data drawn from a normal distribution?*" is one you already know the answer to (no, they're not). Failure to reject would indicate that your sample size was too small to find it.Your actual question is more like "*is the underlying distribution close enough to normal for my purposes?*". That question isn't answered by any of these tests, it's nearer to a question about effect size (something more like *how non-normal is the distribution and in what ways?*).

[You might discern that many times that people test goodness of fit, they're not really addressing the question they need answered. One of many posts that have some discussion of issues like that is here.]

If you have a specific reason for wanting to check normality, that underlying reason may also tend to suggest ways to check its plausibility for that purpose.

So why do you need to know these values are from a normal distribution?

I'm trying to calculate a failure probability for this data set. So I have several variables (of which pipe diameter is one), a failure model (a formula that takes in these variables and outputs a resistance value) and some load cases, which allows me (in theory) to describe the joint distribution of the pipe resistance. In order to do that, I'd need to describe each of those variables' data sets. Hence the search for the distribution that best describes these data

No simple-form distribution is likely to 'best describe' your data -- real data are nearly always more complex than our simple models. To quote George Box --

*Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful*

-- George Box & Norman R. Draper, *Empirical Model-Building and Response Surfaces*

(we can guess these are Box's words rather than Draper's because he's said very similar things elsewhere)

The relevant issue for you seems to be whether your model would be enable you to calculate failure probability accurately enough for your purposes. I see no reason why you'd use a goodness of fit test for any part of that, since a goodness of fit test *simply doesn't address that issue*.

(followup to discussion)

One way to assess the sensitivity to any assumed distribution for inputs in some Monte Carlo simulation of failure rates is to assume some other distribution is the real situation, simulate "real data" from that and see how much your assumption (including any fitting you're doing) affects the results of your subsequent Monte Carlo.

## Best Answer

No, it's a test for a fully specified distribution just like the Kolmogorov-Smirnov. When you estimate parameters but use the Kolmogorov-Smirnov statistic with different tables to account for that, it's properly called the Lilliefors' test; this test is discussed in numerous posts on site.

However, in many cases you can adjust the Anderson-Darling test statistic under the estimation of parameters. (Failing that, your approach of simulation to either get new critical values or p-values for the specific case at hand can work quite well.)

For example, in the case of estimating mean and variance and testing for normality, if the usual statistic $A^2$ is replaced with $A^{*2}=(1+4/n-25/n^2)A^2$ then the usual tables may be used to reasonable accuracy even at quite small sample sizes.

Alternatively in the case of normality, there's a different adjusted statistic $A^{*2}=(1+0.75/n+2.25/n^2)A^2$ (with its own table). To my recollection, details are given in D'Agostino and Stephens (1986) [1], but this result (and more) can be seen in this technical report Stephens (1979); it also has an adjusted statistic for the exponential case (and several other distributions, but not the gamma).

Note that the above-linked report gives an adjustment for the exponential (again with its own table). This page gives one for the Weibull and Gumbel cases.

Certainly in the case of the

betaparameter, you only need to adjust for the fact of estimation itself -- the value of the parameter will make no difference (this is because -- depending on the parameterization you intended -- $\beta$ is either a scale parameter or the inverse of a scale parameter). It's not clear that the alpha parameter always has the same property, but I can be reasonably confident that except at small values of $\alpha$ (i.e. except down near 1 and below 1) it won't mattermuch. If your gamma has a peak that's well to the right of 0 (i.e. $\alpha>>1$), the cube root is an approximately normalizing transform, so different $\alpha$ shouldn't impact the distribution much -- most of the effect will be caused by estimation of alpha rather than the value (this will be the case more generally -- if there's a monotonic transformation which doesn't depend on the parameters that produces a location-scale family, the parameter values themselves won't matter).So the fact that there's a transformation to almost-normality for any reasonably large $\alpha$ suggests that the adjustment used for the normal ($A^{*2}=(1+4/n-25/n^2)A^2$) may also work fairly well for gammas -- at least those with non-small shape parameter.

For cases where the shape parameter is -- or might be -- small (somewhere in the region of 1, or smaller), you may need to consider dealing with it individually, but I don't know for sure that it's necessary (the same adjustment may also work okay even there, but you'd need to check).

If by "is preferred" you mean "has greater power", then certainly. Where the Anderson-Darling tends to perform better at picking up differences in the tail (especially where the tail may be heavier than supposed), the Kolmogorov-Smirnov tends to have better power at differences "in the middle", and where tails are lighter than supposed. [In some of these situations -- such as testing a uniform against a beta with parameters somewhat larger than 1 -- both of them perform badly and you probably would prefer something else over either, but the Kolmogorov-Smirnov can be considerably less terrible than the Anderson-Darling.]

Note that I wouldn't really call the Anderson-Darling a "modified form of the Kolmogorov-Smirnov". I'd say it's a modified (specifically, weighted) form of the Cramer-von Mises test. They're all tests based on the ECDF, but the Kolmogorov-Smirnov looks at the maximum distance while the Cramer-von Mises looks at something related to a sum-of-squared distances, which looks at things quite differently (and has generally better power against the more interesting alternatives). The Anderson-Darling then adjusts for the fact that the variance of the value of the ECDF is smaller near $0$ or $1$ than it is near $\frac12$, to re-weight those squared deviations for the relative precision (which is why it does better at finding deviations in the tail, especially ones that would tend to be associated with being "further" into the tail of the specified distribution).

[1] D'Agostino, R.B.; Stephens, M.A. (1986),

Goodness-of-Fit Techniques,New York: Marcel Dekker.