Solved – What are some reasons iteratively reweighted least squares would not converge when used for logistic regression

convergencegeneralized linear modelirlslogisticr

I've been using the glm.fit function in R to fit parameters to a logistic regression model. By default, glm.fit uses iteratively reweighted least squares to fit the parameters. What are some reasons this algorithm would fail to converge, when used for logistic regression?

Best Answer

In case the two classes are separable, iteratively reweighted least squares (IRLS) would break. In such a scenario, any hyperplane that separates the two classes is a solution and there are infinitely many of them. IRLS is meant to find a maximum likelihood solution. Maximum likelihood does not have a mechanism to favor any of these solutions over the other (e.g. no concept of maximum margin). Depending on the initialization, IRLS should go toward one of these solutions and would break due to numerical problems (don't know the details of IRLS; an educated guess).

Another problem arises in case of linear-separability of the training data. Any of the hyperplane solutions corresponds to a heaviside function. Therefore, all the probabilities are either 0 or 1. The linear regression solution would be a hard classifier rather than a probabilistic classifier.

To clarify using mathematical notation, the heaviside function is $\lim_{|\mathbf{w}| \rightarrow \infty}\sigma(\mathbf{w}^T x + b)$, the limit of sigmoid function, where $\sigma$ is the sigmoid function and $(\mathbf{w}, b)$ determines the hyperplane solution. So IRLS theoretically does not stop and goes toward a $\mathbf{w}$ with increasing magnitude but would break in practice due to numerical problems.