Solved – Numerical stability of IWLS for Gamma models with log-link

gamma distributiongeneralized linear modellog-linear

The combination of a $\Gamma$-distribution with the log-link function in a generalized linear model can be a useful model. However, in my experience the iterative weighted least squares (IWLS) algorithm (specifically, the implementation in R via glm) does not always converge.

The log-link is not the canonical link for the $\Gamma$-distribution, and for this reason you cannot use standard results from exponential families to discuss questions about existence and uniqueness of the solution to the score equation.
An analysis specifically of the $\Gamma$-distribution with the log-link is fairly straight forward, but this question may have been addressed by others before me. I have just not been able to find anything in the literature.

My question is therefore if there are any references on the numerical aspects of generalized linear models with the $\Gamma$-distribution and the log-link? I am specifically interested in discussions related to the IWLS algorithm and whether there are more stable alternatives.

Edit: A small example.

Suppose that $Y$ is log-normally distributed with $\log Y \sim \mathcal{N}(\beta_0 + \beta_1 x, \sigma^2)$. Then
$$\log(E Y) = \beta_0 + \sigma^2/2 + \beta_1 x$$
and $VY = (e^{\sigma^2} – 1) (EY)^2$. Thus this model has the same mean-variance structure as a $\Gamma$-model with log-link and dispersion parameter $\psi = e^{\sigma^2} – 1$. If we keep the reparametrization of the intercept in mind, this model can be fitted using glm with family = Gamma("log"). The parameter estimates will not be the MLE, and the estimates are not efficient when compared to simply log-transforming $Y$ and using lm.

sigma <- 4
beta <- 1
beta0 <- 1
n <- 100
x <- rnorm(n)
y <- exp(beta0 + beta * x + rnorm(n, 0, sigma))
fit1 <- lm(log(y) ~ x)
sigmasqhat <- sum(fit1$residuals^2) / (n-2)
betastart <- coef(fit1) + c(sigmasqhat / 2, 0)
fit <- glm(y ~ x, 
       start =  betastart,
       family = Gamma("log"))

The code above will need to be run a couple of times to discover a problematic data set. I get an error around 2-3% of the times. More frequently, the algorithm does not converge with the default maximal number of iterations (which is 25), but this problem can easily be fixed by increasing maxit. The errors I encounter are NA/NaN/Inf in 'x' and inner loop 1; cannot correct step size.

The example above was considered for educational purposes to investigate differences and similarities between the log-normal and the $\Gamma$ models. I also have a couple of data sets where a $\Gamma$-model with a log-link seems to fit well. It is not a problem to fit the model to the data set itself, but once I bootstrap I start encountering similar convergence problems on a few of the bootstrapped data sets.

Best Answer

I wouldn't say I'm expert on this, so I may well miss something, but here are some relevant points:

  • With optimization, whether one is doing nonlinear least squares, GLMs or something else, a good set of starting values can be important. With gamma+log-link, a robust linear model fitted to the logs may be sufficient in many cases. R's glm allows you to specify starting values.

  • While Fisher scoring (which is effectively IWLS on a transformed problem) is generally quite good, it doesn't converge nicely in all circumstances. There are a huge number of other optimization algorithms available, and glm will let you supply it either with a function or with a string naming a function to use to do the optimization.

The topic of optimization is incredibly broad (in the 'takes a book to answer' sense), it's not something that can be tackled in a few lines.