Equivalence Testing – Equivalence Test for Binomial Data: A Comprehensive Guide

equivalencetost

I want to apply an equivalence test on my sample to infer whether they are equivalent or not.

Since my data are bionominal [0,1] I don’t know whether the TOST procedure (tost() in R) can handle my problem or not.

My data consists of two groups (G1 and G2) which are not equal in numbers of samples. E.g., G1= 164 people and G2=280 people. The samples are binominal, such that G1 includes the number of people who finished and failed in completing a game “in our case study”, and specified with 1 and 0, respectively. E.g., G1 includes 55 players who could finish the game and the rest fail the game. The same for G2 with different numbers. My question is, what is the best equivalence test for this type of data to infer whether they are equivalent? I implemented tost() test in r and got the result, but I am not sure whether the test is correct, since the function automatically calculates mean and SD. For example tost() calculates the mean for the above example G1, m= 0.35, but, when I calculate the mean following n*p(0.5) formula, I obtained m=82.

The # of success in G1 is 58 out of 164, and for G2 is 113 out of 280. Here is my code:

    equivalence::tost(G1,G2,paired = FALSE, epsilon=0.15, var.equal=FALSE,conf.level = 0.95, alpha = 0.05)

And here is the result that I got:

    Welch Two Sample TOST

    data:  G1 and G2
    df = 348.24
    sample estimates:
    mean of G1 mean of G2 
    0.3536585 0.4035714 

    Epsilon: 0.15 
    95 percent two one-sided confidence interval (TOST interval):
        -0.12840509  0.02857931
    Null hypothesis of statistical difference is: rejected 
    TOST p-value: 0.01809221

Best Answer

While one can use the t test to test for proportion difference, the z test is a tad more precise, since it uses an estimate of the standard deviation formulated specifically for binomial (i.e. dichotomous, nominal, etc.) data. The same applies to the z test for proportion equivalence.

First, the z test for difference in proportions of two independent samples is pretty straightforward:

About z tests for unpaired proportion difference
The null hypothesis is $H_{0}\text{: }p_{1} - p_{2} = 0$ (i.e. $H_{0}\text{: }p_{1} = p_{2}$), with $H_{\text{A}}\text{: }p_{1} - p_{2} \ne 0$.

$z = \frac{\hat{p}_{1}-\hat{p}_{2}}{\sqrt{\hat{p}\left(1-\hat{p}\right)\left[\frac{1}{n_{1}} + \frac{1}{n_{2}}\right]}}$,
where:
$\hat{p}_{1}$ and $\hat{p}_{1}$ are the sample proportions in group 1 and group 2;
$n_{1}$ and $n_{2}$ are the sample sizes in group 1 and group 2; and
$\hat{p}$ is the estimate of the sample means if $H_{0}$ is true, the best guess of which is simply the overall sample proportion (i.e. of all the data, ignoring which group an observation is from).

You might want to consider a continuity correction. For example, Hauck and Anderson's (1986) correction gives:

$c_{\text{HA}} = \frac{1}{2\min{(n_{1},n_{2})}}$, and a redefined $s_{\hat{p}}$:

$s_{\hat{p}}= \sqrt{ \frac{\hat{p}_{1}(1-\hat{p}_{1})}{n_{1}-1} + \frac{\hat{p}_{2}(1-\hat{p}_{2})}{n_{2}-1}}$, so that

$z = \frac{\left|\hat{p}_{1} - \hat{p}_{2}\right| - c_{\text{HA}}}{\sqrt{ \frac{\hat{p}_{1}(1-\hat{p}_{1})}{n_{1}-1} + \frac{\hat{p}_{2}(1-\hat{p}_{2})}{n_{2}-1}}}$

The appropriate $p$-value for this $z$-statistic is then calculated or looked up in a table, and compared to $\alpha/2$ (two-tailed test).

About z tests for unpaired proportion equivalence
Because all differences are "statistically significant" given a large enough sample size, it is a good idea to decide beforehand what the smallest relevant difference in proportions is to you, and then look for evidence of such relevance. You find such evidence by combining the inferences from the test for difference just described, with a test for equivalence.

Suppose you decide beforehand that a meaningful difference in proportion for your purposes is on that is at least 0.05 (i.e. $|p_{1} - p_{2}| \ge 0.05$), then the corresponding test for equivalence of proportions for two independent groups is:

$H^{-}_{0}\text{: }|p_{1} - p_{2}| \ge 0.05$, which translates into two one-sided null hypotheses:

$H^{-}_{01}\text{: }p_{1} - p_{2} \ge 0.05$
$H^{-}_{02}\text{: }p_{1} - p_{2} \le -0.05$

These two one-sided null hypotheses can be tested with (these test statistics have been constructed both for upper tail one-sided tests):

$z_{1} = \frac{0.05 - \left(\hat{p}_{1}-\hat{p}_{2}\right)}{\sqrt{\hat{p}\left(1-\hat{p}\right)\left[\frac{1}{n_{1}} + \frac{1}{n_{2}}\right]}}$, and
$z_{2} = \frac{\left(\hat{p}_{1}-\hat{p}_{2}\right)+0.05}{\sqrt{\hat{p}\left(1-\hat{p}\right)\left[\frac{1}{n_{1}} + \frac{1}{n_{2}}\right]}}$.

With a continuity correction $z_{1}$ and $z_{2}$ instead become (see Tu, 1997):

$z_{1} = \frac{0.05 - \left(\hat{p}_{1}-\hat{p}_{2}\right) + c_{\text{HA}}}{\sqrt{ \frac{\hat{p}_{1}(1-\hat{p}_{1})}{n_{1}-1} + \frac{\hat{p}_{2}(1-\hat{p}_{2})}{n_{2}-1}}}$, and
$z_{2} = \frac{\left(\hat{p}_{1}-\hat{p}_{2}\right)+0.05-c_{\text{HA}}}{\sqrt{ \frac{\hat{p}_{1}(1-\hat{p}_{1})}{n_{1}-1} + \frac{\hat{p}_{2}(1-\hat{p}_{2})}{n_{2}-1}}}$.

If you reject both $H^{-}_{01}$ and $H^{-}_{02}$ (both tested at $\alpha$, not $\alpha/2$, and both tested with right tail rejection regions), then you can conclude that you have evidence of equivalence.

**About *relevance tests***
*Finally*... if you combine inference from tests of $H_{0}$ *and* $H^{-}_{0}$ (i.e. test for difference and test for equivalence), then you get one of the following possibilities:

reject $H_{0}$ and reject $H^{-}_{0}$: conclude trivial difference between proportions (i.e. yes there is a difference, but it's too small for you to care about because it is smaller than 0.05);
reject $H_{0}$ and not reject $H^{-}_{0}$: conclude relevant difference between proportions (i.e. larger than 0.05);
not reject $H_{0}$ and reject $H^{-}_{0}$: conclude equivalence of proportions; or
not reject $H_{0}$ and not reject $H^{-}_{0}$: conclude indeterminate (i.e. underpowered tests).

R code

First the test for difference:

Assume g1 and g2 are vectors containing the binomial data for group 1 and group 2 respectively.

    n1 <- length(g1) #sample size group 1
    n2 <- length(g2) #sample size group 2
    p1 <- sum(g1)/n1 #p1 hat
    p2 <- sum(g2)/n2 #p2 hat
    n <- n1 + n2 #overall sample size
    p <- sum(g1,g2)/n #p hat
    cHA <- 1/(2*min(n1,n2))

    # without continuity correction
    z <- (p1 - p2)/sqrt(p*(1-p)*(1/n1 + 1/n2)) #test statistic
    pval <- 1 - pnorm(abs(z)) #p-value reject H0 if it is 
                              #<= alpha/2 (two-tailed)

    # with continuity correction
    zHA <- (abs(p1 - p2) - cHA)/sqrt((p1*(1-p1)/(n1-1)) + 
            (p2*(1-p2)/(n2-1))) #with continuity correction
    pvalHA <- 1 - pnorm(abs(zHA)) #p-value reject H0 if it is 
                                  #<= alpha/2 (two-tailed)

Next the test for equivalence:

Delta <- 0.05 #Equivalence threshold of +/- 5%.
# You will want to carefully think about and select your own
# value for Delta before you conduct your test.

Again, assume g1 and g2 are vectors containing the binomial data for group 1 and group 2 respectively.

    n1 <- length(g1) #sample size group 1
    n2 <- length(g2) #sample size group 2
    p1 <- sum(g1)/n1 #p1 hat
    p2 <- sum(g2)/n2 #p2 hat
    n <- n1 + n2 #overall sample size
    p <- sum(g1, g2)/n #p hat
    cHAeq <- sign(p1-p2)* (1/(2*min(n1, n2)))

    # without continuity correction
    z1 <- (Delta - (p1 - p2))/sqrt(p*(1-p)*(1/n1 + 1/n2)) 
            #test statistic for H01
    z2 <- ((p1 - p2) + Delta)/sqrt(p*(1-p)*(1/n1 + 1/n2)) 
             #test statistic for H02
    pval1 <- 1 - pnorm(z1) 
         #p-value (upper tail) reject H0 if it is <= alpha 
         #(one tail)
    pval2 <- 1 - pnorm(z2) #p-value (upper tail) reject H0 
                           #if it is <= alpha (one tail)

    # with continuity correction
    zHA1 <- (Delta - abs(p1 - p2) + 
             cHAeq)/sqrt((p1*(1-p1)/(n1-1)) + (p2*(1-p2)/(n2-1))) 
             #with continuity correction
    zHA2 <- (abs(p1 - p2) + Delta - cHAeq)/sqrt((p1*(1- 
            p1)/(n1-1)) + (p2*(1-p2)/(n2-1))) 
             #with continuity correction
    pvalHA1 <- 1 - pnorm(zHA1) #p-value (upper tail) reject H0 
                       #if it is <= alpha (one tail)
    pvalHA2 <- 1 - pnorm(zHA2) #p-value (upper tail) reject H0 
                               #if it is <= alpha (one tail)

References

Hauck, W. W. and Anderson, S. (1986). A comparison of large-sample confidence interval methods for the difference of two binomial probabilities. The American Statistician, 40(4):318–322.

Tu, D. (1997). Two one-sided tests procedures in establishing therapeutic equivalence with binary clinical endpoints: fixed sample performances and sample size determination. Journal of Statistical Computation and Simulation, 59(3):271–290.

Related Solutions

Equivalence Test – How to Perform Equivalence Test for Binomial Data?

To do an equivalence test, you need some equivalence margins on some appropriate scale. Then you use some method that gives you a valid confidence interval at the desired level (i.e. to perform a test at level $\alpha$ you need a two-sided level $1-\alpha$ confidence interval) and see whether it lies completely within the equivalence boundaries.

E.g. you might want to work with odds-ratios and might think that anything within a factor of 0.8 to 1.25 (equal delta in both directions on the log-odds-ratio scale) is not a meaningful difference. You could then use logistic regression to get and estimated odds ratio (0.809 for group 1 vs. 2) and to get 95% asymptotic Wald confidence intervals (0.542 to 1.206). Since the confidence interval is not completely within 0.8 to 1.25 (i.e. the lower end falls out of it), you would not have shown equivalence (i.e. you cannot reject the null hypothesis of non-equivalence). With other equivalence margins, you might have equivalence.

Often we can also derive a p-value for the decision, but if all you want is "reject null hypothesis" vs. "null hypothesis not rejected", then all you need is confidence interval. When asymptotics apply, you can use the normal approximation to get such a p-value (essentially looking at what level confidence interval would just overlap with the margin, which is easy to do, if the CI is formed via estimated log-odds ratio +- SE * appropriate percentile of the normal distribution such as 1.96), while with more sparse data this can be a bit difficult.

You could of course also decide that you are not interested in an odds ratio and want a risk difference or a risk ratio.

example <- data.frame(group = c(rep(1,164), rep(2,280)),
                      outcome = c(rep(1,58), rep(0,164-58), rep(1,113), rep(0,280-113)))

glmfit1 <- glm(data = example,
               formula = outcome ~ factor(group),
               family = "binomial")

summary(glmfit1)

# Get estimate and standard error
# use negative coefficient because of choice of reference group
estimate <- -summary(glmfit1)$coefficients[2,1] 
se <- summary(glmfit1)$coefficients[2,2]

# Wald confidence interval on the odds-ratio scale, you get a test decision simply 
# by comparing your NI margin to the CI limits (if the CI limits extend
# beyond it, you cannot reject the null hypothesis of non-equivalence).
# Here we are assuming that you want 95% CIs corresponding to a 5% level test
wald_ci <- c(exp( estimate + se* qnorm(0.975) ),
             exp( estimate - se* qnorm(0.975) ))

# If you really need a p-value, then you can do the following:

# Let us assume this is your equivalence margin:
nimargin <- log(2) 
# p-value using large sample normality assumption (corresponding to Wald confidence limits)
pvalue <- min(1, 2*(1-pnorm( (abs(nimargin) - abs(estimate))/se, 0, 1)))

The example code above gives you a p-value for equivalence of 0.0183, if your NI margin is log(2) on the logit-scale. So for that NI margin you could reject the null hypothesis of non-equivalence and conclude that within those limits the groups are equivalent. On the other hand for a NI margin of 1.25, you would get a p-value from the equivalence test of 0.9579, so you could not reject that null hypothesis.

Best Answer

Related Solutions

Equivalence Test – How to Perform Equivalence Test for Binomial Data?

Related Question