GLMM Error – Resolving PIRLS Loop Resulting in NaN Value in GLMM Model with Gamma Distribution

gamma distributionglmmlme4-nlmemixed modelr

I have a problem fitting a GLMM model with a Gamma distribution (my outcome variable is strictly positive and right-skewed) and an identity link using glmer in R. Whenever I use a certain dataset, I get the error messag:

Error: PIRLS loop resulted in NaN value.

Here is a reprex of a simplified version of the model I'm trying to fit:

library(lme4)
#> Loading required package: Matrix
d <- tibble::tribble(
  ~A, ~R,    ~PID,
  307,       4, "47725",
  6589,       6, "47725",
  2174,       7, "47725",
  9939,       4, "47725",
  405,       4, "47725",
  2072,       5, "47725",
  493,       4, "47725",
  1393,       5, "47725",
  476,       5, "47725",
  2579,       6, "47725",
  3011,       6, "47725",
  59,       6, "47725",
  1948,       2, "47725",
  1653,       3, "47725",
  1407,       4, "47725",
  1021,       6, "47725",
  19650,       2, "55600",
  65683,       7, "55600",
  10380,       4, "55600",
  1519,       7, "55600",
  2747,       5, "55600",
  30487,       6, "55600",
  739,       5, "55600",
  3062,       6, "55600",
  608,       2, "55600",
  9131,       5, "55600",
  28631,       6, "55600",
  2972,       2, "55600",
  14251,       6, "55600",
  23742,       5, "55600"
)

fit <- glmer(A ~ 1 + R + (1 + R |PID), data=d, family=Gamma(link = "identity"))
#> Error in eval(expr, envir, enclos): PIRLS loop resulted in NaN value

My guess is that for some datapoints, the optimization algorithm explores coefficient values for which the result of the equation is negative, and therefore tries to estimate the parameters of the Gamma distribution using a negative number (which is outside its domain, and therefore returns NaN), even though all my real datapoints are positive (and always will be, negative values are nonsensical in my situation).

Using a log link solves the problem, but ideally I would really like to keep an identity link to keep the coefficient interpretations, and prediction of the future coefficients, as simple as possible. The data I have is from a pilot study, and as much as possible I would like to be able to do a power analysis on the coefficient of variable R without it being dependent on the value of the intercept – which would be the case in a model using a log link.

As of now, the only solution to keep the identity link I found was to exclude any A below a rather arbitrary threshold (some exclusion is justifiable, but a very clear theory-driven threshold is difficult to come up with), but I would rather not if possible.

Contrary to regular linear regressions, an arbitrary shift in the outcome variable changes both the intercept and the coefficient, so I don't think it is an option (or I'm missing something).

Can anyone confirm that my hypothesis regarding the reason of the error seems right, and would anyone have a way to fix it that I did not think of?

Best Answer

One of the key assumptions of using linear regression techniques is that the modeled relationship is linear. Plotting your data it seems that at least for PID = 55600, the relationship appears multiplicative (and not additive) on the measurement scale.

enter image description here

One way to satisfy the assumption is to model the expected value using (a) a suitable distribution that describes your data generating process and (b) apply a link function to your expected values by using a GLM(M). Using the log link helps to model a multiplicative relationship on the measurement scale as a linear relationship on the log scale, which is what you need when you assume a linear model for your data.

After you fitted your model, you can apply the inverse link to your coefficients and use those to fit a curved line (multiplicative effect) through the data on the measurement scale (see also ?predict() for this purpose)

I am not quite sure I understand what you mean by

but ideally I would really like to keep an Identity link to keep the coefficient interpretations, and prediction of the future coefficients, as simple as possible

Keeping Gamma(link = "identity") will not respect the domain of the Gamma distribution which is likely the reason as to why you see the penalized iteratively reweighted least squares (PIRLS) algorithm error. Which is what you described in your question. You could get around it by excluding certain values (as you described in your question), or you simply use a Gaussian response distribution, i.e. lmer() (not sure if this is a good idea in your context), or you keep that log link at the expense of not keeping

the coefficient interpretations, and prediction of the future coefficients, as simple as possible

I am not really sure if this answers your question but what I wanted to say unfortunately didn't fit well as a comment :)

Related Question