R lme4-nlme – Resolving Errors in glmer Model Implementation

lme4-nlmer

I am working with a very unbalanced dataset that represents algae response to pollution. This dataset is the reunion of data that came from other studies. Algae are expressed as cell abundance counting, so, the response varies from 0 to more than 20000. Pollution is approached as z-scores because I needed to standardize many different variables, resulting in only one variable to compare. My random effect is related to the kind of measurement performed in the studies to compare treatment and controls: if repeated in time (measured on day1, then on day 2, etc, at the same place), if the measurements were performed in different places (polluted vs non-polluted), or if not specified.

This is what algae abundance looks like. As can be noticed, there are many zeros.

enter image description here

This is how z-scores data look like:

enter image description here

Link to dataset: https://drive.google.com/file/d/1lBrEseqDq4K0pNGp0Gvirn3J8lE-oNDu/view?usp=sharing

I need to measure the effect of pollution on algae considering the random effect, so I'm using this model:

model.abund.phyt<-glmer(response ~ z_scores+ (1|random) , data= dataset, family = "poisson")

I don't know what is going wrong with this analysis but when I run model.abund.phyt this is what I get:

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']

Family: poisson ( log )

Formula: response ~ z_scores + (1 | random)

Data: dataset

 AIC      BIC   logLik deviance df.resid 

 Inf      Inf     -Inf      Inf      441 

Random effects:

Groups Name Std.Dev.

random (Intercept) 1

Number of obs: 444, groups: random, 3

Fixed Effects:

(Intercept) z_scores

  2.333        1.021  

optimizer (Nelder_Mead) convergence code: 0 (OK) ; 22820 optimizer warnings; 1 lme4 warnings

When I run the summary of the model, I get several lines of errors like this:

non-integer x = 0.210000

What do I need to do to get the model to run correctly?
The numerical variables are recognized as numeric when I run str function on the dataset. Is there a problem with my data or the model specifications?

Best Answer

We use random effects to encode information about structure in the data that implies that observations are not independent. For example:

  • We have measurements from different sites in the same geographic area and we expect that two measurements from the same area are more similar than measurements from two different areas (with the same values for the predictor variables). We can deal with this by adding area random effects.
  • We have repeated measurements from the same site at various points in time and we expect that measurements from the same site at two different times are more similar than measurements from different sites. We can deal with this by adding site random effects.

You don't have this level of information as you seem to be putting together a meta-dataset by combining data collected under different conditions.

It's not meaningful to conceptualize the different conditions as "random effects". As @EdM advises, it would be better that you treat them as fixed effects. This simplifies the model and it'll be easier to deal with other errors, if any.

No matter what model you choose, if you don't have information about the structure of the data (which location were measurements collected from? at what time?), you cannot model correlations between observations appropriately. And if you assume that observations are independent when they are in fact correlated, the inference from your model won't be quite right: p-values too small, confidence intervals too narrow. Be aware of this limitation and don't overinterpret the results.