One trick that often helps for logistic regression type problems is to realize that:
$1 - h(x^{(i)}) = h(-x^{(i)})$
and that $h(-x^{(i)})$ is more numerically stable than $1 - h(x^{(i)})$.
You can find a discussion of that here. This is an article in the Stata Journal so the examples are in Stata/Mata, but the problem has to do with the way computers store numbers and is thus more general. For example, I have been able to reproduce the first anomalous example exactly in R, i.e. not just the general pattern but the exact values.
The fact that firth=FALSE
doesn't give similar results to glm
is puzzling to me -- hopefully someone else can answer. As far as pl
goes, though, you're almost always better off with profile confidence intervals. The Wald confidence intervals assume that the (implicit) log-likelihood surface is locally quadratic, which is often a bad approximation. Except that they're more computationally intensive, profile confidence intervals are always (? I would welcome counterexamples ?) more accurate. The "improved" p values you get from the Wald estimates are likely overoptimistic.
Generate data:
dd <- data.frame(X=rep(c("yes","no"),c(22,363)),
Y=rep(c("no","yes","no"),c(22,7,356)))
with(dd,table(X,Y))
Replicate:
m_glm <-glm(Y~X,family=binomial,data=dd)
library("logistf")
m_fp <-logistf(Y~X,data=dd,pl=TRUE,firth=TRUE)
m_mp <- logistf(Y~X,data=dd,pl=TRUE,firth=FALSE)
m_fw <-logistf(Y~X,data=dd,pl=FALSE,firth=TRUE)
m_mw <-logistf(Y~X,data=dd,pl=FALSE,firth=FALSE)
Compare Wald (confint.default
) with profile CIs for glm
(in this case the profile intervals are actually narrower).
confint.default(m_glm) ## {-2740, 2710}
confint(m_glm) ## {NA, 118}
Comparing with the glm2
package (just to make sure that glm
isn't doing something wonky).
library("glm2")
glm2(Y~X,family=binomial,data=dd)
## similar results to glm(...)
Best Answer
I believe there exists no optimum initial value. As stated by user10525, the value β=0 works well and is the default choice for
glm
.In order to check what's going on with
glm
I would follow the basic steps:Try to change the number of iterations in the Newton-Raphson algorithm in
glm
addingcontrol=glm.control(maxit=Y)
, whereY
is the number of desired iterations. I would even begin withY=1
to check stability of the algorithm at β=0.Adding
start=c(a,b,c,...)
you can change the initial value in the regression. Note that the length ofstart
must be equal to $p+1$, where $p$ denotes the number of covariates in your logistic regression.Analyse the stability of the regression for more choices of the above parameters; you could probably find at least a range of initial values corresponding to "convergent" logistic regressions for a given number of iterations.
Andrew Gelman discusses a nice example of divergence of
glm
in presence of "bad" initial value choices in his blog: http://andrewgelman.com/2011/05/04/whassup_with_gl/Please note that
glm
"explodes" in presence of complete separation as the MLE does not exist: this is also something to check.I hope this can be applied to your case.