Solved – Equivalence testing and difference-in-difference measures

difference-in-differenceequivalencehypothesis testingtost

I have some data from a recent experiment that requires slightly more sophisticated testing than I'm used to. Any advice would be most appreciated!

The setup: There are control (C) a treatment (T) samples. For exach sample, approximately half was measured for how much they trust a certain category of people 'A' (call this measure '$ta$') and the other half for how much they trust another type 'B' ($tb$). The measures are discrete, in an equally spaced step function in eight segments.

One social theory has suggested that under such a treatment, trust measures should go down towards both A and B. So one should find that $ta(T)$ is less than $ta(C)$ and $tb(T)$ is less than ta(C). Moreover, it should also be that the difference towards the two groups in the treated, $ta(T)-tb(T)$, should be larger than the difference $ta(C)-tb(C)$. Each observation only has one measure; i.e. a treated individual only has either $ta(T)$ or $tb(T)$.

I hope to disprove these claims. A t-test or Mann-Whitney of (e.g.) $ta(T)$ versus $ta(C)$ fails to reject the null (where $H_0: ta(T)=ta(C)$). But as this could just be an underpowered result, I want to test against a null that ta(C) is greater than ta(T) (and then the same for tb). That is: $H_0^{(a)}: ta(C)> ta(T)$, $H_0^{(b)}: tb(C)> tb(T)$. So my first question: how would I perform this test of equivalence? I tried this with the ttost package, but my problem is that there is no prior data to give an idea of what an acceptable effect size delta would be. Because this is a study measuring trust, it's harder to come up with a sensible magnitude of our measure (as compared I guess to a pharmacological test).

My second question relates to the difference-in-difference problem. How would I go about testing either equivalence or non-equivalence for the difference in the gap between ta and tb, for each sample? For the more routine case where $H_0: ta(C)-tb(C)=ta(T)-tb(T)$, I ran a regression on the trust variable with an interaction regressor that was the product of the dummy for A and B, and the dummy for C or T, and tested its significance. Is that correct? For the more difficult test of equivalence, how would I run a test with the null $H_0: ta(C)-tb(C)>ta(T)-tb(T)$, given that each individual falls into only one of these four groups, the groups are of different sizes, and I still can't use a prior estimate for $|\Delta|$? (As mentioned, the control samples and treatment samples are broken roughly into half with respect to A and B).

Sorry for the long question, and also if anything is unclear. It's my first post! 🙂

Best Answer

As I understand it, your model is:

$$\text{SEND} = \beta_{0} + \beta_{pt}pt + \beta_{partner}partner + \beta_{town}town + \mathbf{B}_{controls}\mathbf{controls} + \varepsilon$$

So your estimated effect of treatment on SEND is given by $\widehat{\beta}_{pt}$. Tests for difference are reported in the vanilla output for linear regression in Stata:

To the right of $\widehat{\beta}_{pt}$ in the Stata output is "Std. Err.", or $\widehat{\sigma}_{\beta_{pt}}$.
To the right of the standard error of the estimate $\widehat{\beta}_{pt}$ is a t test statistic $\left(t= \frac{\widehat{\beta}_{pt}}{\widehat{\sigma}_{\beta_{pt}}} \right)$
To the right of the $t$ statistic is the corresponding p-value—$P\left(|T|\ge |t_{\nu}|\right)$—where the degrees of freedom $\nu=n-$no. of parameter estimates (including $\beta_{0}$).
(To the right of all these is the 95% CI.)

You can formulate an equivalence test for $\beta_{pt}$ (or any of the parameter estimates) in two ways: in units of the parameter (e.g. the slope of $pt$ vs. $SEND$), or in units of the $t$ distribution. Using Stata's tostti command (see the tost package) you specify units of the parameter using the eqvtype(delta) option, and specify units of the $t$ distribution using the eqvtype(epsilon) option.

#Formulating a test for equivalence in terms of $\Delta$:

The general negativist null hypothesis is $H^{^{–}}_{0}: |\beta_{pt}| \ge \Delta$, (i.e. $\beta_{pt}$ is equivalent to $0$ within an equivalence threshold of $\Delta$) with $H^{^{–}}_{\text{A}}: |\beta_{pt}| < \Delta$, and the corresponding specific null hypotheses for two one-sided tests are:

$H^{^{–}}_{01}: \beta_{pt} \ge \Delta$, with $H^{^{–}}_{\text{A}1}: \beta_{pt} < \Delta$, and
$H^{^{–}}_{02}: \beta_{pt} \le –\Delta$, with $H^{^{–}}_{\text{A}2}: \beta_{pt} > –\Delta$

The corresponding test statistics for these two null hypotheses are:

$t_{1} = \frac{\Delta - \widehat{\beta}_{pt}}{\widehat{\sigma}_{\beta_{pt}}}$, and
$t_{2} = \frac{\widehat{\beta}_{pt} + \Delta}{\widehat{\sigma}_{\beta_{pt}}}$

These test statistics are both constructed to be upper tail tests, so:

$p_{1} = P(T>t_{1\nu})$, and
$p_{2} = P(T>t_{2\nu})$.

You reject $H_{0}^{^{–}}$ only if both $p_{1}\le \alpha$, and $p_{2} \le \alpha$, and if you did, would conclude that your found evidence that $\beta_{pt}$ is equivalent to $0$ within $\pm \Delta$ at the $\alpha$ level of significance.

You can conduct this test for equivalence using tostti in Stata: tostti #obs #mean #sd 0, eqvtype(delta) eqvlevel(#), where:

#obs is $n-$no. of variables in your regression model (I think I count 13 in your case?)... basically it's the degrees of freedom+1.
#mean is $\beta_{pt}$
Updated: (I forgot that tostti expects the SD, not the SE) #sd is $\widehat{\sigma}_{\beta_{pt}}\times\sqrt{n}$
The # in the eqvlevel option is your value of $\Delta$ (I am assuming you want a symmetrical equivalence region, if not, check out the help file's uppereqvlevel() option). See my remarks on specific values of $\Delta$ below.

#Formulating a test for equivalence in terms of $\varepsilon$:

The general negativist null hypothesis is $H^{^{–}}_{0}: |t| \ge \varepsilon$, (i.e. $t$ is equivalent to $0$ within an equivalence threshold of $\varepsilon$) with $H^{^{–}}_{\text{A}}: |t| < \varepsilon$, and the corresponding specific null hypotheses for two one-sided tests are:

$H^{^{–}}_{01}: t \ge \varepsilon$, with $H^{^{–}}_{\text{A}1}: t < \varepsilon$, and
$H^{^{–}}_{02}: t \le –\varepsilon$, with $H^{^{–}}_{\text{A}2}: t > –\varepsilon$

The corresponding test statistics for these two null hypotheses are:

$t_{1} = \varepsilon-t$, and
$t_{2} = t+\varepsilon$, where the $t$ for both these tests is the one reported to the right of $\widehat{\beta}_{pt}$ in the Stata output.

These test statistics are both constructed to be upper tail tests, so:

$p_{1} = P(T>t_{1\nu})$, and
$p_{2} = P(T>t_{2\nu})$.

You can conduct this test for equivalence using tostti in Stata: tostti #obs #mean #sd 0, eqvtype(epsilon) eqvlevel(#), where:

#obs is $n-$no. of variables in your regression model (I think I count 13 in your case?)... basically it's the degrees of freedom+1.
#mean is $\beta_{pt}$
Updated: (I forgot that tostti expects the SD, not the SE) #sd is $\widehat{\sigma}_{\beta_{pt}}\times \sqrt{n}$
The # in the eqvlevel option is your value of $\varepsilon$ (I am assuming you want a symmetrical equivalence region, if not, check out the help file's uppereqvlevel() option). See my remarks on specific values of $\varepsilon$ below.

#Relevance testing The hella cool application of tests for equivalence is to base inference off both a test for equivalence and a test for difference (this is termed "relevance testing"). Four results obtain:

Reject $H^{^{+}}_{0}$ & Not reject $H^{^{–}}_{0}$: conclude relevant difference
Not reject $H^{^{+}}_{0}$ & Reject $H^{^{–}}_{0}$: conclude equivalence
Reject $H^{^{+}}_{0}$ & Reject $H^{^{–}}_{0}$: conclude trivial difference (i.e. you found evidence of a difference that you have said a priori is too small to care about)
Not reject $H^{^{+}}_{0}$ & Not reject $H^{^{–}}_{0}$: conclude indeterminate (i.e. you have under-powered data for your test, and can say nothing about difference or equivalence)

You obtain relevance tests in tostti by including the relevance option.

#Specific values of $\Delta$ & $\varepsilon$ The value of either $\Delta$ or $\varepsilon$ is a researcher choice. As you point out, there's no a priori literature on what size a "relevant effect" of $\beta_{pt}$ is: if you define equivalence in terms of $\Delta$, then you are using units of $SEND/pt$. Defining equivalence/relevance in terms of $t$ is (a) perhaps a little easier to do in this situation, and (b) is a little more abstract. Some points about selecting a value of $\varepsilon$:

It is impossible to reject any $H_{0}^{^{–}}$ if $\varepsilon\le t_{\alpha\nu}$, so $\varepsilon$ should be thought of as $t_{\alpha\nu}+something$.
I like to think of something as "how much greater the magnitude of $t$ would have to be in order to be relevant".
Half a standard deviation seems a fairly liberal definition of the equivalence/relevance threshold, or $\varepsilon = t_{\alpha\nu} + .5\sqrt{\nu/(\nu-2)}$ (The standard deviation of $t=\sqrt{\nu/(\nu-2)}$, so I have added half of that to $t_{\alpha\nu}$). You can obtain $t_{\alpha\nu}$ in Stata with: di invttail(df,alpha), where df is your degrees of freedom (no. observations - no. of parameter estimates, including the _cons).

A strict definition of the equivalence/relevance threshold might use $.25\sqrt{\nu/(\nu-2)}$, and a very strict $.125\sqrt{\nu/(\nu-2)}$.

You can of course conduct equivalence tests (and relevance tests!) for any of the parameters estimated in your model. You might find useful my answer to Peter Flom's question to get an idea about presenting many equivalence tests in a regression context.

Note: linear regression makes no assumption of normality of the dependent or independent variables; rather, linear regression assumes the residuals are normally distributed (and normally distributed residuals do not require any particular distribution of dependent or independent variables).

Best Answer

Related Solutions

Equivalence Tests – Intuitive Explanation of Differences Between TOST and UMP Tests

Equivalence – Understanding Why TOST Method Uses 90% Confidence Interval

Related Question