Equivalence Tests – Equivalence Tests for Non-Normal Data: Methods and Applications

equivalencehypothesis testingtost

I have some data that I can't necessarily assume to be drawn from normal distributions, and I would like to conduct tests of equivalence between groups. For normal data, there are techniques like TOST (two one-sided t-tests). Is there anything analogous to TOST for non-normal data?

Best Answer

The logic of TOST employed for Wald-type t and z test statistics (i.e. $\theta / s_{\theta}$ and $\theta / \sigma_{\theta}$, respectively) can be applied to the z approximations for nonparametric tests like the sign, sign rank, and rank sum tests. For simplicity I assume that equivalence is expressed symmetrically with a single term, but extending my answer to asymmetric equivalence terms is straightforward.

One issue that arises when doing this is that if one is accustomed to expressing the equivalence term (say, $\Delta$) in the same units as $\theta$, then the the equivalence term must be expressed in units of the particular sign, signed rank, or rank sum statistic, which is both abstruse, and dependent on N.

However, one can also express TOST equivalence terms in units of the test statistic itself. Consider that in TOST, if $z = \theta/\sigma_{\theta}$, then $z_{1} = (\Delta - \theta)/\sigma_{\theta}$, and $z_{2} = (\theta + \Delta)/\sigma_{\theta}$. If we let $\varepsilon = \Delta / \sigma_{\theta}$, then $z_{1} = \varepsilon - z$, and $z_{2} = z + \varepsilon$. (The statistics expressed here are both evaluated in the right tail: $p_{1} = \text{P}(Z > z_{1})$ and $p_{2} = \text{P}(Z > z_{2})$.) Using units of the z distribution to define the equivalence/relevance threshold may be preferable for non-parametric tests, since the alternative defines the threshold in units of signed-ranks or rank sums, which may be substantively meaningless to researchers and difficult to interpret.

If we recognize that (for symmetric equivalence intervals) it is not possible to reject any TOST null hypothesis when $\varepsilon \le z_{1-\alpha}$, then we might proceed to make decisions on appropriate size of the equivalence term accordingly. For example $\varepsilon = z_{1-\alpha} + 0.5$.

This approach has been implemented with options for continuity correction, etc. in the package tost for Stata (which now includes specific TOST implementations for the Shapiro-Wilk and Shapiro-Francia tests), which you can access by typing in Stata:

Edit: Why the logic of TOST is sound, and equivalence test formations have been applied to omnibus tests, I have been persuaded that my solution was based on a deep misunderstanding of the approximate statistics for the Shapiro-Wilk and Shapiro-Francia tests