Solved – 85% Bonferroni family-wide (simultaneous) CI for lm in R

bonferroniconfidence intervallmr

The code

confint(model, level = 1 -0.15/length(coef(model)))

However, I have two questions:

How do I avoid including the intercept? For example, here is my output:

> confint(ACFL.m1, level = 1 -0.15/length(coef(ACFL.m1)))
               1.87 %    98.12 %
(Intercept)   1.5838374  3.2846837
PDCD          8.6257799 12.3346769
PMIX        -17.6815440  6.8408247
RTCL          0.2888126  0.6987631

I only want 85% CI for my predictors and not the intercept?

In the above example, level = 1-0.15/4 = 0.9625. I know, to get 85% CI for each predictor, the individual CI has to be > 85%, but how do I understand 0.9625% for each summing to as an 85% simultaneous CI?

Best Answer

The answer to question 1:

I learned that the package multcomp allows calculation of simultaneous CIs without the intercept. Details about the package are in https://cran.r-project.org/web/packages/multcomp/multcomp.pdf

Its' default is the single-step method and is considered to be more powerful than the Bonferroni method (Bretz et al. 2011). The results of the default method is shown below:

> K <- cbind(0, diag(length(coef(ACFL.m1))-1))
> rownames(K) <- names(coef(ACFL.m1)) [-1]
> ACLF.mc <- glht(ACFL.m1, linfct = K)
> confint(ACLF.mc, level = 0.85)

Simultaneous Confidence Intervals

Fit: lm(formula = PINDX ~ PDCD + PMIX + RTCL, data = ACFL)

Quantile = 1.9021
85% family-wise confidence level


Linear Hypotheses:
           Estimate   lwr      upr     
PDCD == 0  10.4802   8.7910  12.1695
PMIX == 0  -5.4204 -16.5895   5.7488
RTCL == 0   0.4938   0.3071   0.6805

To get the Bonferroni 85% CI, I had to first run the Bonferroni Test

ACFL.bf <- summary(ACLF.mc, test= adjusted(type = "bonferroni"))

Then calculate the confidence interval

confint(ACFL.bf, level = 0.85)

I got the following output:

 Simultaneous Confidence Intervals

Fit: lm(formula = PINDX ~ PDCD + PMIX + RTCL, data = ACFL)

Quantile = 1.9021
85% family-wise confidence level


Linear Hypotheses:
           Estimate   lwr      upr     
PDCD == 0  10.4802   8.7909  12.1696
PMIX == 0  -5.4204 -16.5898   5.7491
RTCL == 0   0.4938   0.3071   0.6805

Good reference for the multcomp package:

Bretz F, Hothorn T, Westfall PH (2011) Multiple comparisons using R. CRC Press, Boca Raton, FL

Hothorn T (2017) Additional multcomp examples. Accessed online form the R project website https://cran.r-project.org/web/packages/multcomp/vignettes/multcomp-examples.pdf

Hothorn T, Bretz F, Westfall P (2008) Simultaneous inference in general parametric models. Biom J 50:346–363

The answer to question 2:

Based on the formula above, my 85% family-wise CI = 1-0.15/4 = 0.9625% for each variable. I now understand how the lower (1.87 =(1-0.9625)/2) and upper bound values (98.12= ((1-0.9625)/2)+0.9625) came about. I had made a fundamental error in assuming that the 96.25% was a sum of CIs for each variable. I should not have been confused between the 'Bonferroni' and the 'sequential Bonferroni' (Holm, 1979). Now I understand that each variable is tested at 96.25% so that the family-wise error is 85%. I have left the question as it is so others with similar questions can benefit.

Related Solutions

Solved – Identifying structural breaks in regression with Chow test

The strucchange package contains Chow and F tests for structural changes in regression models. The package comes with a vignette which shows how to use the package.

Solved – Testing simultaneous and lagged effects in longitudinal mixed models with time-varying covariates

I know this is probably too late for your benefit, but perhaps for others I will provide an answer.

You can include time-varying covariates in a longitudinal random-effects models (see Applied Longitudinal Analysis by Fitzmaurice, Laird and Ware, 2011 and http://www.ats.ucla.edu/stat/r/examples/alda/ specifically for R – use lme). Interpretation of trends depends on if you code time as categorical or continuous and your interaction terms. So for instance, if time is continuous and your covariates x1 and x2 are binary (0 and 1) and time-dependent, the fixed model is:

$$yij = \beta_0 + \beta_1x_{1ij} + \beta_2x_{2ij} + \beta_3time_{ij} + \beta_4 \times (x_{1ij} * time_{ij}) + \beta_5 \times (x_{2ij} * time_{ij})$$

i is for ith person, j is for jth occasion

$\beta_4$ and $\beta_5$ capture the difference in trends between levels of $x_1$ and $x_2$ while accounting for change over time in $x_1$ and $x_2$. Unless you specify $x_1$ and $x_2$ as random effects, correlations between the repeated measures will not be taken into account (but this needs to be based on theory and can get messy if you have too many random effects - i.e., model won’t converge). There is also some discussion about centering time-dependent covariates to remove bias, although I have not done this (Raudenbush & Bryk, 2002). Interpretation, in general, is also more difficult if you have a continuous time-dependent covariate.

$\beta_1$ and $\beta_2$ capture the cross-sectional association between $x_1$ and $y$ and $x_2$ and $y$ at the intercept ($\beta_0$). The intercept is where time is zero (baseline or wherever you centered your time variable). This interpretation could also be changed if you have a higher order model (e.g., quadratic).

You would code this in R as something like:

model<- lme(y ~ time*x1 + time*x2, data, random= ~time|subject, method="")

Singer and Willet appear to use ML for “method” but I have always been taught to use REML in SAS for overall results but compare the fit of different models using ML. I would imagine you could use REML in R too.

You can also model the correlation structure for y by adding to the previous code:

correlation = [you’ll have to look up the options]

I am not sure I understand your reasoning for only being able to test lagged effects. I am not familiar with modeling lagged effects so I can’t really speak to that here. Perhaps I am wrong, but I would imagine that modeling lagged effects would undermine the usefulness of mixed models (e.g., being able to include subjects with missing time-dependent data)

Best Answer

Related Solutions

Solved – Identifying structural breaks in regression with Chow test

Solved – Testing simultaneous and lagged effects in longitudinal mixed models with time-varying covariates

Related Question