Solved – Initial values for logistic regression using maximum likelihood

logisticmaximum likelihoodr

I'm trying to calculate logistic regression coefficients by defining the log-likelihood function and using maximum likelihood.

In some cases when the initial (start) values I gave to the maximum likelihood were not correct I got wrong results for the logistic regression (different from the ones I get when using glm for example).

Given the input data and y values, what should be the optimum initial values for logistic regression (or, in other words, what are the values that are being used in glm)?

Best Answer

I believe there exists no optimum initial value. As stated by user10525, the value β=0 works well and is the default choice for glm.

In order to check what's going on with glm I would follow the basic steps:

Try to change the number of iterations in the Newton-Raphson algorithm in glm adding control=glm.control(maxit=Y), where Y is the number of desired iterations. I would even begin with Y=1 to check stability of the algorithm at β=0.
Adding start=c(a,b,c,...) you can change the initial value in the regression. Note that the length of start must be equal to $p+1$, where $p$ denotes the number of covariates in your logistic regression.
Analyse the stability of the regression for more choices of the above parameters; you could probably find at least a range of initial values corresponding to "convergent" logistic regressions for a given number of iterations.
Andrew Gelman discusses a nice example of divergence of glm in presence of "bad" initial value choices in his blog: http://andrewgelman.com/2011/05/04/whassup_with_gl/
Please note that glm "explodes" in presence of complete separation as the MLE does not exist: this is also something to check.

I hope this can be applied to your case.

Related Solutions

Solved – Hessian matrix and initial guess in logistic regression

One trick that often helps for logistic regression type problems is to realize that:

$1 - h(x^{(i)}) = h(-x^{(i)})$

and that $h(-x^{(i)})$ is more numerically stable than $1 - h(x^{(i)})$.

You can find a discussion of that here. This is an article in the Stata Journal so the examples are in Stata/Mata, but the problem has to do with the way computers store numbers and is thus more general. For example, I have been able to reproduce the first anomalous example exactly in R, i.e. not just the general pattern but the exact values.

Solved – Understanding confidence intervals in Firth penalized logistic regression

The fact that firth=FALSE doesn't give similar results to glm is puzzling to me -- hopefully someone else can answer. As far as pl goes, though, you're almost always better off with profile confidence intervals. The Wald confidence intervals assume that the (implicit) log-likelihood surface is locally quadratic, which is often a bad approximation. Except that they're more computationally intensive, profile confidence intervals are always (? I would welcome counterexamples ?) more accurate. The "improved" p values you get from the Wald estimates are likely overoptimistic.

Generate data:

dd <- data.frame(X=rep(c("yes","no"),c(22,363)),
             Y=rep(c("no","yes","no"),c(22,7,356)))
with(dd,table(X,Y))

Replicate:

m_glm <-glm(Y~X,family=binomial,data=dd)
library("logistf")
m_fp <-logistf(Y~X,data=dd,pl=TRUE,firth=TRUE)
m_mp <- logistf(Y~X,data=dd,pl=TRUE,firth=FALSE)
m_fw <-logistf(Y~X,data=dd,pl=FALSE,firth=TRUE)
m_mw <-logistf(Y~X,data=dd,pl=FALSE,firth=FALSE)

Compare Wald (confint.default) with profile CIs for glm (in this case the profile intervals are actually narrower).

confint.default(m_glm)  ## {-2740, 2710}
confint(m_glm)          ## {NA, 118}

Comparing with the glm2 package (just to make sure that glm isn't doing something wonky).

library("glm2")
glm2(Y~X,family=binomial,data=dd)
## similar results to glm(...)

Best Answer

Related Solutions

Solved – Hessian matrix and initial guess in logistic regression

Solved – Understanding confidence intervals in Firth penalized logistic regression

Related Question