R lme4-nlme – Resolving Errors in glmer Model Implementation

lme4-nlmer

I am working with a very unbalanced dataset that represents algae response to pollution. This dataset is the reunion of data that came from other studies. Algae are expressed as cell abundance counting, so, the response varies from 0 to more than 20000. Pollution is approached as z-scores because I needed to standardize many different variables, resulting in only one variable to compare. My random effect is related to the kind of measurement performed in the studies to compare treatment and controls: if repeated in time (measured on day1, then on day 2, etc, at the same place), if the measurements were performed in different places (polluted vs non-polluted), or if not specified.

This is what algae abundance looks like. As can be noticed, there are many zeros.

This is how z-scores data look like:

Link to dataset: https://drive.google.com/file/d/1lBrEseqDq4K0pNGp0Gvirn3J8lE-oNDu/view?usp=sharing

I need to measure the effect of pollution on algae considering the random effect, so I'm using this model:

model.abund.phyt<-glmer(response ~ z_scores+ (1|random) , data= dataset, family = "poisson")

I don't know what is going wrong with this analysis but when I run model.abund.phyt this is what I get:

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']

Family: poisson ( log )

Formula: response ~ z_scores + (1 | random)

Data: dataset

 AIC      BIC   logLik deviance df.resid 

 Inf      Inf     -Inf      Inf      441

Random effects:

Groups Name Std.Dev.

random (Intercept) 1

Number of obs: 444, groups: random, 3

Fixed Effects:

(Intercept) z_scores

  2.333        1.021

optimizer (Nelder_Mead) convergence code: 0 (OK) ; 22820 optimizer warnings; 1 lme4 warnings

When I run the summary of the model, I get several lines of errors like this:

non-integer x = 0.210000

What do I need to do to get the model to run correctly?
The numerical variables are recognized as numeric when I run str function on the dataset. Is there a problem with my data or the model specifications?

Best Answer

We use random effects to encode information about structure in the data that implies that observations are not independent. For example:

We have measurements from different sites in the same geographic area and we expect that two measurements from the same area are more similar than measurements from two different areas (with the same values for the predictor variables). We can deal with this by adding area random effects.
We have repeated measurements from the same site at various points in time and we expect that measurements from the same site at two different times are more similar than measurements from different sites. We can deal with this by adding site random effects.

You don't have this level of information as you seem to be putting together a meta-dataset by combining data collected under different conditions.

It's not meaningful to conceptualize the different conditions as "random effects". As @EdM advises, it would be better that you treat them as fixed effects. This simplifies the model and it'll be easier to deal with other errors, if any.

No matter what model you choose, if you don't have information about the structure of the data (which location were measurements collected from? at what time?), you cannot model correlations between observations appropriately. And if you assume that observations are independent when they are in fact correlated, the inference from your model won't be quite right: p-values too small, confidence intervals too narrow. Be aware of this limitation and don't overinterpret the results.

Related Solutions

Mixed-Effects Meta-Regression – Comparing Nested Random Effects in Metafor vs Mixed Model in LME

Indeed,

res <- rma.mv(yi, vi, data=meta_table, method="REML", level=95,
              mods = ~ A+B+C, random=~1|Study/Site)

will fit a three-level meta-analytic model. For a pretty extensive illustration of this model (based on Konstantopoulos, 2011), take a look at http://www.metafor-project.org/doku.php/analyses:konstantopoulos2011 (the districts in that example corresponds to the studies in your case and the studies in that example correspond to the sites in your case).

On the metafor package website, I have also written up a rather extensive discussion as to what lme() does when you try to fit meta-analytic models with it (see: http://www.metafor-project.org/doku.php/tips:rma_vs_lm_and_lme). The varIdent(~vi) part is in principle correct, but the lme() function will assume that the sampling variances are known up to a proportionality constant (what is estimated as the residual variance by the function). That is different than what we typically want to do in meta-analyses (where the sampling variances are pretty much known).

References

Konstantopoulos, S. (2011). Fixed effects and variance components estimation in three-level meta-analysis. Research Synthesis Methods, 2(1), 61–76.

Solved – Binomial glmm with a categorical variable with full successes

Your intuition is exactly right. This phenomenon is called complete separation. You can find quite a lot (now that you know its name) Googling around ... It is fairly thoroughly discussed here in a general context, and here in the context of GLMMs. The standard solution to this problem is to add a small term that pushes the parameters back toward zero -- in frequentist contexts this is called a penalized or bias-corrected method. The standard algorithm is due to Firth (1993, "Bias reduction of maximum likelihood estimates" Biometrika 80, 27-38), and is implemented in the logistf package on CRAN. In Bayesian contexts this is framed as adding a weak prior to the fixed-effect parameters.

To my knowledge Firth's algorithm hasn't been extended to GLMMs, but you can use the Bayesian trick by using the blme package, which puts a thin Bayesian layer over the top of the lme4 package. Here's an example from the above-linked GLMM discussion:

cmod_blme_L2 <- bglmer(predation~ttt+(1|block),data=newdat,
                   family=binomial,
                   fixef.prior = normal(cov = diag(9,4)))

The first two lines in this example are exactly the same as we would use in the standard glmer model; the last specifies that the prior for the fixed effects is a multivariate normal distribution with a diagonal variance-covariance matrix. The matrix is 4x4 (because we have 4 fixed-effect parameters in this example), and the prior variance of each parameter is 9 (corresponding to a standard deviation of 3, which is pretty weak -- that means +/- 2SD is (-6,6), which is a very large range on the logit scale).

The very large standard errors of the parameters in your example are an example of a phenomenon closely related to complete separation (it occurs whenever we get extreme parameter values in a logistic model) called the Hauck-Donner effect.

Two more potentially useful references (I haven't dug into them yet myself):

Gelman A, Jakulin A, Pittau MG and Su TS (2008) A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics, 2, 1360–383.
José Cortiñas Abrahantes and Marc Aerts (2012) A solution to separation for clustered binary data Statistical Modelling 12(1):3–27 doi: 10.1177/1471082X1001200102

A more recent Google scholar search for "bglmer 'complete separation'" finds:

Quiñones, A. E., and W. T. Wcislo. “Cryptic Extended Brood Care in the Facultatively Eusocial Sweat Bee Megalopta genalis.” Insectes Sociaux 62.3 (2015): 307–313.

Best Answer

Related Solutions

Mixed-Effects Meta-Regression – Comparing Nested Random Effects in Metafor vs Mixed Model in LME

Solved – Binomial glmm with a categorical variable with full successes

Related Question