An excellent topic which is, sadly, not given enough attention.
When discussing multiple parameters and confidence intervals, a distinction should be made between simultaneous inference and selective inference. Ref.[2] gives an excellent demonstration of the matter.
Simultaneous confidence intervals mean that all the parameters are covered with $1-\alpha$ confidence.
Selective confidence intervals mean that a subset of selected parameters are covered.
These two concepts can be combined:
Say you construct intervals only on parameters for which you rejected the null hypothesis. You are clearly dealing with selective inference. You may want to guarantee simultaneous coverage of selected parameters, or marginal coverage of selected parameters. The former would be the counterpart of FWER control, and the latter of FDR control.
Now more to the point:
Not all testing procedures have their accompanying intervals.
For FWER procedures and their accompanying intervals, see [3]. Sadly, this reference is a bit outdated.
For the interval counterpart of BH FDR control, see [1] and an application in [4] (which also includes a brief review of the matter).
Please note that this is a fresh and active research field so that you can expect more results in the near future.
[1] Benjamini, Y., and D. Yekutieli. “False Discovery Rate-Adjusted Multiple Confidence Intervals for Selected Parameters.” Journal of the American Statistical Association 100, no. 469 (2005): 71–81.
[2] Cox, D. R. “A Remark on Multiple Comparison Methods.” Technometrics 7, no. 2 (1965): 223–24.
[3] Hochberg, Y., and A. C. Tamhane. Multiple Comparison Procedures. New York, NY, USA: John Wiley & Sons, Inc., 1987.
[4] Rosenblatt, J. D., and Y. Benjamini. “Selective Correlations; Not Voodoo.” NeuroImage 103 (December 2014): 401–10.
For an R package, you might take a look at lsmeans. For mlm
models, it sets up the multivariate response as if it were a factor whose levels are the dimenstions of the response. Then you can do estimates or contrasts of those, with or without other factors being involved. See the example for the MOats
dataset that accompanies the package.
It also supports equivalence tests via providing a delta
argument in summary
or test
. A section of the vignette (see vignette("using-lsmeans")
) covers equivalence testing.
Best Answer
Let us simulate $100$ realisations of $N(0, 1)$ and test the null hypothesis $H_0: \mu=0$ with the t-test. If this is done many times, $H_0$ should be rejected approximately $5\%$ of the times.
$$ $$
If we now simulate $5$ random variables and test the null hypothesis that all means are simultaneously $0$, then the probability of at least one significant result is larger than $5\%$; actually it is $1 - (1 - 0.05)^5 = 0.226$.
$$ $$
But if you use the Bonferroni correction, then