Solved – Numerical stability of IWLS for Gamma models with log-link

gamma distributiongeneralized linear modellog-linear

The combination of a $\Gamma$-distribution with the log-link function in a generalized linear model can be a useful model. However, in my experience the iterative weighted least squares (IWLS) algorithm (specifically, the implementation in R via glm) does not always converge.

The log-link is not the canonical link for the $\Gamma$-distribution, and for this reason you cannot use standard results from exponential families to discuss questions about existence and uniqueness of the solution to the score equation.
An analysis specifically of the $\Gamma$-distribution with the log-link is fairly straight forward, but this question may have been addressed by others before me. I have just not been able to find anything in the literature.

My question is therefore if there are any references on the numerical aspects of generalized linear models with the $\Gamma$-distribution and the log-link? I am specifically interested in discussions related to the IWLS algorithm and whether there are more stable alternatives.

Edit: A small example.

Suppose that $Y$ is log-normally distributed with $\log Y \sim \mathcal{N}(\beta_0 + \beta_1 x, \sigma^2)$. Then
$$\log(E Y) = \beta_0 + \sigma^2/2 + \beta_1 x$$
and $VY = (e^{\sigma^2} – 1) (EY)^2$. Thus this model has the same mean-variance structure as a $\Gamma$-model with log-link and dispersion parameter $\psi = e^{\sigma^2} – 1$. If we keep the reparametrization of the intercept in mind, this model can be fitted using glm with family = Gamma("log"). The parameter estimates will not be the MLE, and the estimates are not efficient when compared to simply log-transforming $Y$ and using lm.

sigma <- 4
beta <- 1
beta0 <- 1
n <- 100
x <- rnorm(n)
y <- exp(beta0 + beta * x + rnorm(n, 0, sigma))
fit1 <- lm(log(y) ~ x)
sigmasqhat <- sum(fit1$residuals^2) / (n-2)
betastart <- coef(fit1) + c(sigmasqhat / 2, 0)
fit <- glm(y ~ x, 
       start =  betastart,
       family = Gamma("log"))

The code above will need to be run a couple of times to discover a problematic data set. I get an error around 2-3% of the times. More frequently, the algorithm does not converge with the default maximal number of iterations (which is 25), but this problem can easily be fixed by increasing maxit. The errors I encounter are NA/NaN/Inf in 'x' and inner loop 1; cannot correct step size.

The example above was considered for educational purposes to investigate differences and similarities between the log-normal and the $\Gamma$ models. I also have a couple of data sets where a $\Gamma$-model with a log-link seems to fit well. It is not a problem to fit the model to the data set itself, but once I bootstrap I start encountering similar convergence problems on a few of the bootstrapped data sets.

Best Answer

I wouldn't say I'm expert on this, so I may well miss something, but here are some relevant points:

With optimization, whether one is doing nonlinear least squares, GLMs or something else, a good set of starting values can be important. With gamma+log-link, a robust linear model fitted to the logs may be sufficient in many cases. R's glm allows you to specify starting values.
While Fisher scoring (which is effectively IWLS on a transformed problem) is generally quite good, it doesn't converge nicely in all circumstances. There are a huge number of other optimization algorithms available, and glm will let you supply it either with a function or with a string naming a function to use to do the optimization.

The topic of optimization is incredibly broad (in the 'takes a book to answer' sense), it's not something that can be tackled in a few lines.

Related Solutions

Solved – R squared formula for Generalized Linear Models with GAMMA distribution

Your question- yes they can be. Technically, you can use a normal r squared measurement as your goodness of fit measure. It might not be a very good fit measure, but you can certainly use it. Further, you have to ask yourself if your increase in precision is worth the loss of readability of your findings. For example, moving from r-squared to an adjusted r-square is likely to be a meaningful increase in precision at the sacrifice of readability. I personally like McKelvey & Zavoina and other similar approaches (e.g. xu's r squared for mixed models). That does not mean they are the best or only approaches.

Solved – Gamma glm log link – what does predicted values mean

I was hoping someone else would expand upon my comments in a full answer, but here is mine, incase anyone needs help with a similar question.

1. Response to what does it predict

Glm ''like'' regression predicts the mean value given the independent variables. Selecting response in predict function back transforms the prediction out of the link scale (inverses the link), so the prediction is on the same scale as the dependent variable. If you "overfit" the model you can return back the exact value of y, but when modeling you're trying to generalize and approximate. It may not be the same as the arithmetic mean because the mean is estimated using maximum likelihood and is conditional on the included independent variables.

2. Response to how to "know" if the gamma distribution is right for my data

The gamma distribution is very flexible and is, in fact, a series of distributions that changes shape depending on the response (variance changes with mean). The gamma distribution requires the data to be positive definite and continuous if you fit those conditions the gamma distribution will quite often fit. Additionally, the Pearson residuals should be normally distributed and show no signs of heteroscedasticity. For more on model evaluation see: https://www.crcpress.com/Extending-the-Linear-Model-with-R-Generalized-Linear-Mixed-Effects-and/Faraway/p/book/9781498720960

3. Response to what happens when I use the qgamma function?

I am not 100% sure why the person you borrowed your sample code from is using the qgamma function. However, it looks to me like they are trying to return the 90th percent confidence limit. This is the code I use to return back 95% confidence limits on predictions.

 preds_link = predict(model, newdata = test_data,
                     type = "link",
                     se.fit = TRUE)# use the glm to make predictions, also provides  std. error
 critval <- 1.96 # critical value for approx 95% CI
 upper_ci_link <- preds_link$fit + (critval * preds_link$se.fit)# estimate upper CI for prediction on link scale
 lwr_ci_link <- preds_link$fit - (critval * preds_link$se.fit)# estimate lower CI for prediction on link scale
 fit_link <- preds_link$fit# returns fited value
 upper_ci <- model$family$linkinv(upper_ci_link)
 lwr_ci <- model$family$linkinv(lwr_ci_link)
 fit <- model$family$linkinv(fit)
 preds_link = data.frame(fit,lwr_ci,upper_ci,
                    fit_link,lwr_ci_link,
                    upper_ci_link)# puts predictions, CI in a single dataframe
 colnames(preds_link) = c("Prediction","LCL", "UCL",
                     "Link_Prediction","Link_LCL", "Link_UCL")# give variables logical names

Prediction, LCL, and UCL: Prediction and Confidence Limits on response scale (i.e., same scale as data)

Link_Prediction, Link_LCL, and Link_UCL: Prediction and Confidence Limits on link scale (i.e., log transformed)

If you wish to return back the 90th percent confidence value just change the critical value used to one corresponding to $\alpha$ = 0.10 instead of $\alpha$ = 0.05

Best Answer

Related Solutions

Solved – R squared formula for Generalized Linear Models with GAMMA distribution

Solved – Gamma glm log link – what does predicted values mean

Related Question