Solved – Paired two one-sided t-tests (TOST) with unequal sample sizes

equivalencepaired-comparisonssample-sizet-testtost

I have data from two populations (let's call them "before" and "after"), collected in a paired fashion. That is, I take a measurement under one set of conditions before, and then I can take multiple measurements under the same set of conditions after. For example, the "pairs" in this case might consist of $(X^{b,i}, (X^{a,i}_{1}, X^{a,i}_{2}))$, where $X^{b,i}$ is the measurement from the "before" population taken under conditions i, and $X^{a,i}_1$ and $X^{a,i}_2$ are two measurements from the "after" population taken under conditions i. I have this data for $i=1, \ldots, n$ sets of conditions.

I want to test for equivalence of the means of the two populations. A paired two one-sided t-tests approach seems appropriate, and I've been using the implementation available in the statsmodels Python package.

How should I handle the unequal sample sizes when performing a paired two one-sided t-tests?

My current approach is to turn each $(X^{b,i}, (X^{a,i}_{1}, X^{a,i}_{2}))$ pair into two pairs: $(X^{b,i}, X^{a,i}_{1})$ and $(X^{b,i}, X^{a,i}_{2})$. Then I use a paired two one-sided t-tests on this new larger set of pairs (with $2n$ new pairs). Is this appropriate? I worry that the new pairs are not independent (because they share $X^{b,i}$), and that this will invalidate this approach.

Best Answer

I don't know any specific references for this case.

In analogy to some of the methods for repeated measures ANOVA, the relevant t-test would use the mean of the two 'after' observations and compare it with the 'before' observation. The variance of the average within difference will be smaller with more observations per individual, so the test still takes the larger sample into account.

An alternative approach would be to use cluster robust standard errors for the two pair differences as in your approach. This would be possible with OLS with the two pairs sample and specify the individual as grouping variable. OLS in statsmodels only has t-test after estimation, but the TOST rejection decision could still be obtained by comparing the 2 alpha confidence interval with the equivalence boundaries.

about cluster robust standard errors:

OLS provides a consistent estimator of the parameters for the linear model even if there is correlation across observations or heteroscedasticity. However, the usual estimate for the standard errors or covariance of the parameter estimates is incorrect. One possible solution to correlation and heteroscedasticity is to use the OLS parameter estimates, but correct the standard errors by using a sandwich form of robust standard errors.

For example here is the Wikipedia page for heteroscedasticity robust standard errors http://en.wikipedia.org/wiki/Heteroscedasticity-consistent_standard_errors

For the specific case when we have correlation within small groups or clusters but no correlation across groups, we can use cluster robust standard errors to correct for the within cluster correlation. An extensive discussion is available in

Cameron, A. Colin, and Douglas L. Miller. "A practitioner’s guide to cluster-robust inference."

(aside: statsmodels provides robust covariance matrices for the linear model, OLS, WLS, for discrete models like Logit and Poisson, for GLM. Cluster robust standard errors are the default for GEE. The list of provided types is here http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.RegressionResults.get_robustcov_results.html )

Related Question