Suppose we have a multiple comparisons scenario such as post hoc inference on pairwise statistics, or like a multiple regression, where we are making a total of $m$ comparisons. Suppose also, that we would like to support inference in these multiples using confidence intervals.
1. Do we apply multiple comparison adjustments to CIs? That is, just as multiple comparisons compel a redefinition of $\alpha$ to either the family-wise error rate (FWER) or the false discovery rate (FDR), does the meaning of confidence (or credibility1, or uncertainty, or prediction, or inferential… pick your interval) get similarly altered by multiple comparisons? I realize that a negative answer here will moot my remaining questions.
2. Are there straightforward translations of multiple comparison adjustment procedures from hypothesis testing, to interval estimation? For example, would adjustments focus on changing the $\text{CI-level}$ term in the confidence interval: $\text{CI}_{\theta} = (\hat{\theta} \pm t_{(1-\text{CI-level)/2}}\hat{\sigma}_{\theta})$?
3. How would we address step-up or step-down control procedures for CIs? Some family-wise error rate adjustments from the hypothesis testing approach to inference are 'static' in that precisely the same adjustment is made to each separate inference. For example, the Bonferroni adjustment is made by altering rejection criterion from:
- reject if $p\le \frac{\alpha}{2}$ to:
- reject if $p\le \frac{\frac{\alpha}{2}}{m}$,
but the Holm-Bonferroni step-up adjustment is not 'static', but rather made by:
- first ordering $p$-values smallest to largest, and then
- reject if $p\le 1 – (1- \frac{\alpha}{2})^{\frac{1}{m+1-i}}$, (where $i$ indexes the ordering of the $p$-values) until
- we fail to reject a null hypothesis, and automatically fail to reject all subsequent null hypotheses.
Because rejection/failure to reject is not happening with CIs (more formally, see the references below) does that mean that stepwise procedures don't translate (i.e. including all of the FDR methods)? I ought to caveat here that I am not asking how to translate CIs into hypothesis tests (the representatives of the 'visual hypothesis testing' literature cited below get at that non-trivial question).
4. What about any of those other intervals I mentioned parenthetically in 1?
1 Gosh, I sure hope I don't get in trouble with those rockin' the sweet, sweet Bayesian styles by using this word here. π
References
Afshartous, D. and Preston, R. (2010). Confidence intervals for dependent data: Equating non-overlap with statistical significance. Computational Statistics & Data Analysis, 54(10):2296β2305.
Cumming, G. (2009). Inference by eye: reading the overlap of independent confidence intervals. Statistics In Medicine, 28(2):205β220.
Payton, M. E., Greenstone, M. H., and Schenker, N. (2003). Overlapping confidence intervals or standard error intervals: What do they mean in terms of statistical significance? Journal of Insect Science, 3(34):1β6.
Tryon, W. W. and Lewis, C. (2008). An inferential confidence interval method of establishing statistical equivalence that corrects Tryonβs (2001) reduction factor. Psychological Methods, 13(3):272β277.
Best Answer
An excellent topic which is, sadly, not given enough attention.
When discussing multiple parameters and confidence intervals, a distinction should be made between simultaneous inference and selective inference. Ref.[2] gives an excellent demonstration of the matter.
Simultaneous confidence intervals mean that all the parameters are covered with $1-\alpha$ confidence.
Selective confidence intervals mean that a subset of selected parameters are covered.
These two concepts can be combined: Say you construct intervals only on parameters for which you rejected the null hypothesis. You are clearly dealing with selective inference. You may want to guarantee simultaneous coverage of selected parameters, or marginal coverage of selected parameters. The former would be the counterpart of FWER control, and the latter of FDR control.
Now more to the point: Not all testing procedures have their accompanying intervals. For FWER procedures and their accompanying intervals, see [3]. Sadly, this reference is a bit outdated. For the interval counterpart of BH FDR control, see [1] and an application in [4] (which also includes a brief review of the matter). Please note that this is a fresh and active research field so that you can expect more results in the near future.
[1] Benjamini, Y., and D. Yekutieli. βFalse Discovery Rate-Adjusted Multiple Confidence Intervals for Selected Parameters.β Journal of the American Statistical Association 100, no. 469 (2005): 71β81.
[2] Cox, D. R. βA Remark on Multiple Comparison Methods.β Technometrics 7, no. 2 (1965): 223β24.
[3] Hochberg, Y., and A. C. Tamhane. Multiple Comparison Procedures. New York, NY, USA: John Wiley & Sons, Inc., 1987.
[4] Rosenblatt, J. D., and Y. Benjamini. βSelective Correlations; Not Voodoo.β NeuroImage 103 (December 2014): 401β10.