Solved – Initial values for logistic regression using maximum likelihood

logisticmaximum likelihoodr

I'm trying to calculate logistic regression coefficients by defining the log-likelihood function and using maximum likelihood.

In some cases when the initial (start) values I gave to the maximum likelihood were not correct I got wrong results for the logistic regression (different from the ones I get when using glm for example).

Given the input data and y values, what should be the optimum initial values for logistic regression (or, in other words, what are the values that are being used in glm)?

Best Answer

I believe there exists no optimum initial value. As stated by user10525, the value β=0 works well and is the default choice for glm.

In order to check what's going on with glm I would follow the basic steps:

  1. Try to change the number of iterations in the Newton-Raphson algorithm in glm adding control=glm.control(maxit=Y), where Y is the number of desired iterations. I would even begin with Y=1 to check stability of the algorithm at β=0.

  2. Adding start=c(a,b,c,...) you can change the initial value in the regression. Note that the length of start must be equal to $p+1$, where $p$ denotes the number of covariates in your logistic regression.

  3. Analyse the stability of the regression for more choices of the above parameters; you could probably find at least a range of initial values corresponding to "convergent" logistic regressions for a given number of iterations.

  4. Andrew Gelman discusses a nice example of divergence of glm in presence of "bad" initial value choices in his blog: http://andrewgelman.com/2011/05/04/whassup_with_gl/

  5. Please note that glm "explodes" in presence of complete separation as the MLE does not exist: this is also something to check.

I hope this can be applied to your case.