Solved – Parameter estimation with generalized linear models

estimationgeneralized linear modelmaximum likelihoodoptimizationr

By default when we use a glm function in R, it uses the iteratively reweighted least squares (IWLS) method to find the maximum likelihood estimation the parameters. Now I have two questions.

Does IWLS estimations guarantee the global maximum of the likelihood function? Based on the last slide in this presentation, I think it does not! I just wanted to make sure of that.
Can we say that the reason for question 1 above is because of the fact that almost all the numerical optimization methods may stuck at a local maximum rather than a global maximum?

Best Answer

You are correct that in general, IWLS, like other numerical optimization methods, can only guarantee convergence to a local maximum, if they even converge. Here's a nice example where the starting value was outside the convergence domain for the algorithm used by glm() in R. However, it is worth noting that for GLMs with the canonical link, the likelihood is concave, see here. Thus, if the algorithm converges, it will have converged to the global mode!

The last issue pointed out in the slide is a problem where the MLE for a paramter is at infinity. This can occur in logistic regression where there exists complete separation. In such a case, you will get a warning message that the fitted probabilities are numerically 0 or 1. It's important to note that when this occurs, the algorithm has not converged to the mode, thus this does not have to do with the algorithm being stuck in a local maximum.

Related Solutions

R – Reasons Iteratively Reweighted Least Squares May Not Converge in Logistic Regression

In case the two classes are separable, iteratively reweighted least squares (IRLS) would break. In such a scenario, any hyperplane that separates the two classes is a solution and there are infinitely many of them. IRLS is meant to find a maximum likelihood solution. Maximum likelihood does not have a mechanism to favor any of these solutions over the other (e.g. no concept of maximum margin). Depending on the initialization, IRLS should go toward one of these solutions and would break due to numerical problems (don't know the details of IRLS; an educated guess).

Another problem arises in case of linear-separability of the training data. Any of the hyperplane solutions corresponds to a heaviside function. Therefore, all the probabilities are either 0 or 1. The linear regression solution would be a hard classifier rather than a probabilistic classifier.

To clarify using mathematical notation, the heaviside function is $\lim_{|\mathbf{w}| \rightarrow \infty}\sigma(\mathbf{w}^T x + b)$, the limit of sigmoid function, where $\sigma$ is the sigmoid function and $(\mathbf{w}, b)$ determines the hyperplane solution. So IRLS theoretically does not stop and goes toward a $\mathbf{w}$ with increasing magnitude but would break in practice due to numerical problems.

Solved – Estimation with MLE and returning the score/gradient (QMLE)

The numDeriv package can indeed be used to compute the gradient and the hessian (if needed). In both cases the argument y of the log-likelihood is passed through the dots mechanism, using an argument with the suitable name. For a vector-valued function, the jacobian function of the same package can be used similarly.

You could also consider computing analytical derivatives rather than numerical ones.

library(numDeriv)
H <- hessian(func = loglik, x = b$par, y = y)
g <- grad(func = loglik, x = b$par, y = y)

We can compute as well the Jacobian of a function returning a vector of length $T-1$.

mlogDens <- function(theta, y) {
  T <- length(y)
  -dnorm(y[2:T], mean = theta[1] + theta[2] * y[1:(T-1)], 
                 sd = theta[3], log = TRUE)
}
## a matrix with dim (T-1, 3)
G <- jacobian(func = mlogDens, x = b$par, y = y)
## a matrix with dim (9, T-1)
GG <- apply(G, MARGIN = 1, FUN = tcrossprod)
## a vector with length 9 representing a symmetric mat.
GGsum0 <- apply(GG, MARGIN = 1, FUN = sum)
## a symmetric mat.
GGsum <- matrix(GGsum0, nrow = 3, ncol = 3)

Best Answer

Related Solutions

R – Reasons Iteratively Reweighted Least Squares May Not Converge in Logistic Regression

Solved – Estimation with MLE and returning the score/gradient (QMLE)

Related Question