Solved – Model selection with Firth logistic regression

aiclogisticmodel selectionseparation

In a small data set ($n\sim100$ ) that I am working with, several variables give me perfect prediction/separation. I thus use Firth logistic regression to deal with the issue.

If I select the best model by AIC or BIC, should I include the Firth penalty term in the likelihood when computing these information criteria?

Best Answer

If you want to justify the use of BIC: you can replace the maximum likelihood with the maximum a posteriori (MAP) estimate and the resulting 'BIC'-type criterion remains asymptotically valid (in the limit as the sample size $n \to \infty$). As mentioned by @probabilityislogic, Firth's logistic regression is equivalent to using a Jeffrey's prior (so what you obtain from your regression fit is the MAP).

The BIC is a pseudo-Bayesian criterion which is (roughly) derived using a Taylor series expansion of the marginal likelihood $$p_y(y) = \int L(\theta; y)\pi(\theta)\mathrm{d} \theta$$ around the maximum likelihood estimate $\hat{\theta}$. Thus it ignores the prior, but the effect of the latter vanishes as information concentrates in the likelihood.

As a side remark, Firth's regression also removes the first-order bias in exponential families.