Solved – Understanding confidence intervals in Firth penalized logistic regression

confidence intervalinterpretationlikelihood-ratiologisticregularization

I recently discovered penalized likelihood ratio methods to cope with sparse and/or separated data.

I'm having some problems though in understanding the results a logistic regression using Firth method (package logistf in R) give back.

I have a dataset with an outcome and a predictor, both dichotomous:

      Y.yes Y.no
X.yes 0     22
X.no  7     356

I perform the regressions using logistf() switching the pl and the firth parameters.

  • pl specifies if confidence intervals and tests should be based on the profile penalized log likelihood (pl=TRUE) or on the Wald method (pl=FALSE).
  • firth use of Firth's penalized maximum likelihood (firth=TRUE) or the standard maximum likelihood method (firth=FALSE) for the logistic regression.

coefficients, CIs and p values for the xYes case versus xNo are shown.

1. pl=T;firth=T    coef:-3.5    ci.low:-11.6   ci.high:4.85   p.val:1
2. pl=F;firth=T    coef:-3.5    ci.low:-7.89   ci.high:0.88   p.val:0.117
3. pl=T;firth=F    coef:-2.96   ci.low:-11.1   ci.high:1.08   p.val:1
4. pl=F;firth=F    coef:-2.96   ci.low:-8.17   ci.high:2.24   p.val:0.264

I also show results for classical ML logistic via glm():

5. glm   coef:-14.6   ci.low:NA   ci.high:118   p.val:0.992

As we can see, CIs are improved in logistf because we don't find anymore more extreme values or NA or infinite. Still I'm not well sure which option to choose. The fourth model, with both options set false, in my understanding should be identical to normal glm(), while it's clearly not. Also I don't get why choosing Wald CI method (model 2 and 4) improve p values so much in comparison to profile penalized likelihood (model 1 and 3). Moreover choosing Wald CIs gives also shorter CIs ranges.

Things get stranger if I had another binomial covariate X2:

                Y.yes  Y.no
X1.yes  X2.yes  0      12
        X2.no   0      10

X1.no   X2.yes  6      191
        X2.no   1      165

Here are the regressions:

1. pl=T;firth=T    
    X1 coef:-1.63    ci.low:-2.79   ci.high:-0.38   p.val:1
    X2 coef:3.15     ci.low:-0.72   ci.high:11.2    p.val:1
2. pl=F;firth=T    
    X1 coef:-1.63    ci.low:-4.24   ci.high:0.97    p.val:0.220
    X2 coef:3.15     ci.low:-2.11   ci.high:8.42    p.val:0.240
3. pl=T;firth=F    
    X1 coef:-1.08    ci.low:-5.17   ci.high:2.86    p.val:0.006
    X2 coef:3.35     ci.low:1.48    ci.high:11.5    p.val:1
4. pl=F;firth=F    
    X1 coef:-1.08    ci.low:-4.27   ci.high:2.10    p.val:0.504
    X2 coef:3.35     ci.low:-2.71   ci.high:9.42    p.val:0.279
5. glm()    
    X1 coef:-15.6    ci.low:NA      ci.high:196.4   p.val:0.994
    X2 coef:1.645    ci.low:-0.13   ci.high:4.58    p.val:0.130

In the results we see some oddities: in model 1, where CIs are both below the zero but nevertheless the regression has a maximal p value; while in model 3 we have the opposite, that is significant p value but non significant CIs.and again in glm we have totally different values. Furthermore p values change dramatically in the different models, meaning that choosing the wrong model would totally change interpretation of the study!

So my question are, how can I explain the results I reported above? and then, which criteria should I follow to choose the right options for logistf?

thanks!

Best Answer

The fact that firth=FALSE doesn't give similar results to glm is puzzling to me -- hopefully someone else can answer. As far as pl goes, though, you're almost always better off with profile confidence intervals. The Wald confidence intervals assume that the (implicit) log-likelihood surface is locally quadratic, which is often a bad approximation. Except that they're more computationally intensive, profile confidence intervals are always (? I would welcome counterexamples ?) more accurate. The "improved" p values you get from the Wald estimates are likely overoptimistic.

Generate data:

dd <- data.frame(X=rep(c("yes","no"),c(22,363)),
             Y=rep(c("no","yes","no"),c(22,7,356)))
with(dd,table(X,Y))

Replicate:

m_glm <-glm(Y~X,family=binomial,data=dd)
library("logistf")
m_fp <-logistf(Y~X,data=dd,pl=TRUE,firth=TRUE)
m_mp <- logistf(Y~X,data=dd,pl=TRUE,firth=FALSE)
m_fw <-logistf(Y~X,data=dd,pl=FALSE,firth=TRUE)
m_mw <-logistf(Y~X,data=dd,pl=FALSE,firth=FALSE)

Compare Wald (confint.default) with profile CIs for glm (in this case the profile intervals are actually narrower).

confint.default(m_glm)  ## {-2740, 2710}
confint(m_glm)          ## {NA, 118}

Comparing with the glm2 package (just to make sure that glm isn't doing something wonky).

library("glm2")
glm2(Y~X,family=binomial,data=dd)
## similar results to glm(...)