Solved – simple equivalence test version of the Kolmogorov–Smirnov test

equivalencekolmogorov-smirnov testtost

Has two one-sided tests for equivalence (TOST) been framed for the Kolmogorov–Smirnov test to test the negativist null hypothesis that two distributions differ by at least some researcher-specified level?

If not TOST, then some other form of equivalence test?

Nick Stauner wisely points out that (I should already know 😉 that there are other nonparametric TOST equivalence tests for null hypotheses for stochastic equivalence, and, with more restrictive assumptions, for median equivalence.

Best Answer

Ok, here's my first attempt. Close scrutiny and comments appreciated!

The Two-Sample Hypotheses
If we can frame two-sample one-sided Kolmogorov-Smirnov hypothesis tests, with null and alternate hypotheses along these lines:

$\text{H}_{0}\text{: }F_{Y}\left(t\right) \geq F_{X}\left(t\right)$, and

$\text{H}_{\text{A}}\text{: }F_{Y}\left(t\right) < F_{X}\left(t\right)$, for at least one $t$, where:

the test statistic $D^{-}=\left|\min_{t}\left(F_{Y}\left(t\right) - F_{X}\left(t\right)\right)\right|$ corresponds to $\text{H}_0\text{: }F_{Y}\left(t\right) \geq F_{X}\left(t\right)$;
the test statistic $D^{+}=\left|\max_{t}\left(F_{Y}\left(t\right) - F_{X}\left(t\right)\right)\right|$ corresponds to $\text{H}_0\text{: }F_{Y}\left(t\right) \leq F_{X}\left(t\right)$; and
$F_{Y}\left(t\right)$ & $F_{X}\left(t\right)$ are the empirical CDFs of samples $Y$ and $X$,

then it should be reasonable to create a general interval hypothesis for an equivalence test along these lines (assuming that the equivalence interval is symmetric for the moment):

$\text{H}^{-}_0\text{: }\left|F_{Y}\left(t\right) - F_{X}\left(t\right)\right| \geq \Delta$, and

$\text{H}^{-}_{\text{A}}\text{: }\left|F_{Y}\left(t\right) - F_{X}\left(t\right)\right| < \Delta$, for at least one $t$.

This would translate to the specific two one-sided "negativist" null hypotheses to test for equivalence (these two hypotheses take the same form, since both $D^{+}$ and $D^{-}$ are strictly non-negative):

$\text{H}^{-}_{01}\text{: }D^{+} \geq \Delta$, or

$\text{H}^{-}_{02}\text{: }D^{-} \geq \Delta$.

Rejecting both $\text{H}^{-}_{01}$ and $\text{H}^{-}_{02}$ would lead one to conclude that $-\Delta < F_{Y}\left(t\right) - F_{X}\left(t\right) < \Delta$. Of course, the equivalence interval need not be symmetric, and $-\Delta$ and $\Delta$ could be replaced with $\Delta_{2}$ (lower) and $\Delta_{1}$ (upper) for the respective one-sided null hypotheses.

The Test Statistics (Updated: Delta is outside the absolute value sign)
The test statistics $D^{+}_{1}$ and $D^{-}_{2}$ (leaving the $n_{Y}$ and $n_{X}$ implicit) correspond to $\text{H}^{-}_{01}$ and $\text{H}^{-}_{02}$, respectively, and are:

$D^{+}_{1} = \Delta - D^{+} = \Delta - \left|\max_{t}\left[\left(F_{Y}\left(t\right) - F_{X}\left(t\right)\right)\right]\right|$, and

$D^{-}_{2} = \Delta - D^{-} = \Delta - \left|\min_{t}\left[\left(F_{Y}\left(t\right) - F_{X}\left(t\right)\right)\right]\right|$

The Equivalence/Relevance Threshold
The interval $[-\Delta, \Delta]$—or $[\Delta_{2}, \Delta_{1}]$, if using an asymmetric equivalence interval—is expressed in units of $D^{+}$ and $D^{-}$, or the magnitude of differenced probabilities. As $n_{Y}$ and $n_{X}$ approach infinity, the CDF of $D^{+}$ or $D^{-}$ for $n_{Y},n_{X}$ approaches $0$ for $t\le 0$, and must be $>0$ for $t > 0$:

$$\lim_{n_{Y},n_{X}\to \infty}p^{+} = \text{P}\left(\sqrt{\frac{n_{Y}n_{X}}{n_{Y}+n_{X}}}D^{+} \le t\right) = 1 - e^{-2t^{2}}$$

CDF of D^+ (or D^-)

So it seems to me that the PDF for sample size-scaled $D^{+}$ (or sample size-scaled $D^{-}$) must be $0$ for $t<0$, and must be $>0$ for $t \ge 0$:

$$f(t) = {1 - e^{-2t^{2}}}\frac{d}{dt} = 4te^{-2t^{2}}$$

PDF of D^+ (or D^-)

Glen_b points out that this is a Rayleigh distribution with $\sigma=\frac{1}{2}$. So the large sample quantile function for sample size-scaled $D^{+}$ and $D^{-}$ is:

$$\text{CDF}^{-1} = Q\left(p\right) = \sqrt{\frac{-\ln{\left(1 - p\right)}}{2}}$$

and a liberal choice of $\Delta$ might be the critical value $Q_{\alpha}+\sigma/2 = Q_{\alpha}+\frac{1}{4}$, and a more strict choice the critical value $Q_{\alpha}+\sigma/4=Q_{\alpha}+\frac{1}{8}$.

Related Solutions

Solved – Kolmogorov-Smirnov two-sample test

I am assuming you are asking because the Suanshu help page reports in reference to the K-S distribution, "This is not done yet." Luckily, it is very easy to do in R. If x and y are your two samples, ks.test(x,y) returns the test statistic and pvalue. For example,

> x <- rnorm(50)
> y <- runif(30)
> ks.test(x, y)    
        Two-sample Kolmogorov-Smirnov test    
data:  x and y 
D = 0.5, p-value = 9.065e-05
alternative hypothesis: two-sided

By default, it will compute exact or asymptotic p-values based on the product of the sample sizes (exact p-values for n.x*n.y < 10000 in the two-sample case), or you can specify this option with a third argument, exact=F or exact=T. Exact p-values are calculated using the methods of Marsaglia, et al. (2003), which the Suanshu documentation also cites. Some large sample approximations are given here, although I don't have a proper citation. Lastly, if you don't want to install R, there are web calculators for the two-sample K-S test, although I don't know if they use the same algorithm as R because the one I found only reported three decimal points for the p-value.

Solved – Equivalence tests for non-normal data

The logic of TOST employed for Wald-type t and z test statistics (i.e. $\theta / s_{\theta}$ and $\theta / \sigma_{\theta}$, respectively) can be applied to the z approximations for nonparametric tests like the sign, sign rank, and rank sum tests. For simplicity I assume that equivalence is expressed symmetrically with a single term, but extending my answer to asymmetric equivalence terms is straightforward.

One issue that arises when doing this is that if one is accustomed to expressing the equivalence term (say, $\Delta$) in the same units as $\theta$, then the the equivalence term must be expressed in units of the particular sign, signed rank, or rank sum statistic, which is both abstruse, and dependent on N.

However, one can also express TOST equivalence terms in units of the test statistic itself. Consider that in TOST, if $z = \theta/\sigma_{\theta}$, then $z_{1} = (\Delta - \theta)/\sigma_{\theta}$, and $z_{2} = (\theta + \Delta)/\sigma_{\theta}$. If we let $\varepsilon = \Delta / \sigma_{\theta}$, then $z_{1} = \varepsilon - z$, and $z_{2} = z + \varepsilon$. (The statistics expressed here are both evaluated in the right tail: $p_{1} = \text{P}(Z > z_{1})$ and $p_{2} = \text{P}(Z > z_{2})$.) Using units of the z distribution to define the equivalence/relevance threshold may be preferable for non-parametric tests, since the alternative defines the threshold in units of signed-ranks or rank sums, which may be substantively meaningless to researchers and difficult to interpret.

If we recognize that (for symmetric equivalence intervals) it is not possible to reject any TOST null hypothesis when $\varepsilon \le z_{1-\alpha}$, then we might proceed to make decisions on appropriate size of the equivalence term accordingly. For example $\varepsilon = z_{1-\alpha} + 0.5$.

This approach has been implemented with options for continuity correction, etc. in the package tost for Stata (which now includes specific TOST implementations for the Shapiro-Wilk and Shapiro-Francia tests), which you can access by typing in Stata:

Edit: Why the logic of TOST is sound, and equivalence test formations have been applied to omnibus tests, I have been persuaded that my solution was based on a deep misunderstanding of the approximate statistics for the Shapiro-Wilk and Shapiro-Francia tests

Best Answer

Related Solutions

Solved – Kolmogorov-Smirnov two-sample test

Solved – Equivalence tests for non-normal data

Related Question