GLMM – How to Model Nested Fixed-Factor with Generalized Linear Mixed Models

lme4-nlmenested datar

I am a beginner on learning in GLMM and R, please forgive me if I am not making sense or asking something that is very basic.

I am trying to specify a nested fixed-factor in my GLMM model, but I don't seem to find the way to do it. I would really appreciate if someone can answer this question.

My experiment consists of four different species (A,B,C,D), and I have used all possible combinations of the four species as different treatments (composition, Compo, each Compo n = 4 in each ExpRun). Compo is therefore nested within SPrich, because Compo consisting only species A will only be possible under SPrich of 1, but not others. Therefore my data is not balanced under factor SPrich. For example:

SPrich  Compo(abbr.)
0       Control
1       A
1       B
1       C
1       D
2       AB
2       AC
2       AD
2       BC
2       BD
2       CD
3       ABC
...
4       ABCD

I am interested to know whether the species richness SPrich (five levels) and composition Compo (16 levels) has an effect on the response variable WaterChlA, which I measured on different Day from each of my experimental tanks TankNo. WaterChlA is not normally distributed, but more closely resemble a lognormal distribution, so gaussian link log is used. Samples from each TankNo are taken between Day, thus they are not independent. I have conducted the experiment for three times, denoted in factor ExpRun. TankNo are denoted differently in all three ExpRun and therefore a nested random factor.

Here is a section of my data to demonstrate:

> str(Wetland)
'data.frame':   480 obs. of  31 variables:
$ TankNo    : Factor w/ 150 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ ExpRun    : Factor w/ 3 levels "1","2","3": 2 2 2 2 2 2 2 2 2 2 ...
$ Day       : Factor w/ 4 levels "1","8","15","22": 1 1 1 1 1 1 1 1 1 1 ...
$ SPrich    : Ord.factor w/ 5 levels "0"<"1"<"2"<"3"<..: 1 1 1 1 2 2 2 2 2 2 ...
$ FxRich    : Ord.factor w/ 3 levels "0"<"1"<"2": 1 1 1 1 2 2 2 2 2 2 ...
$ Compo     : Factor w/ 16 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ...
$ MIFI      : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 2 2 1 1 ...
$ KAPU      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 2 2 ...
$ DUME      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ POME      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ WaterChlA : num  8.33 1.99 35.56 17.49 22.04 ...

The question comes as my fixed-factor Compo should be nested within SPrich but it seems that it is more common to nest random factors in glmm, and I have not encounter on how to nest fixed-factors. I read in a Github post by Prof Bolker that specifying A/B in R actually equals to 1 + A + A:B, but I wonder if writing that would be enough to specify the nestedness between SPrich and Compo (as below)? If not, what should be the correct way to specify that Compo is nested within SPrich?

I input the following into the model to run glmer with package lme4:

glmm1 <- glmer(WaterChlA ~ ExpRun + Day + SPrich/Compo + ExpRun*Day + ExpRun*SPrich + 
ExpRun*SPrich/Compo + Day*SPrich + Day*SPrich/Compo + (1|ExpRun/TankNo), data = Wetland, 
family = gaussian(link = "log"), na.action = na.exclude)

It said:

fixed-effect model matrix is rank deficient so dropping 404 columns / coefficients

which is understandable because my fixed-factors are not full-rank but nested, so I am not too surprised if it has to drop the non-existing combinations of coefficients. However, it sort of demonstrates that the nestedness isn't really put into consideration by the model specified above(?). It also failed to converge:

Warning messages:
1: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf,  :
  failure to converge in 10000 evaluations
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.112084 (tol = 0.001, component 1)

I want to know whether the way I specify the nested Compo in SPrich is correct. I also wonder if GLMM is the most suitable way to analyze this type of data, or is there any other more suitable tools for this, perhaps GLM would be sufficient to analyze my data? I go for GLMM because there is quite a large variance among different Tanks and thus masking the effect of factors that are of interests, and treating TankNo as a random factor should help lower the influence.

Other than SPrich, I'm also interested to look at how functional richness FxRich affects the response, but Compo is nested within FxRich differently than within SPrich (say Compo A & B is same functional group and C & D is another, then Compo AB will be SPrich 2 and FxRich 1, while Compo AC will be SPrich 2 and FxRich 2, thus the nesting is quite different and complicated), which I think is more sensible to resolve later in another model that do not include SPrich or Compo. I wonder if this is reasonable? Perhaps this should go into another question thread and be discussed later…

Thank you all for you attention! Really appreciate any thoughts on the above!

Best Answer

Stream of consciousness:

  • you might want to consider log-transforming the response (provided there are no zeros) rather than using the log link, i.e. lmer(log(WaterChlA) ~ ...) rather than glmer(WaterChla ~ ..., family=gaussian(link="log")); I say this because log-transforming can take care of heteroscedasticity in the response (specifically a standard deviation approximately proportional to the expected mean value), which the log-link approach doesn't do (also, lmer tends to be a little bit faster and more stable than glmer)
  • Nested fixed effects are indeed hard to specify in linear models (as opposed to ANOVA frameworks), where the underlying framework is explicitly trying to estimate parameters rather than just evaluate sums of squares/proportions of variance explained. If the parameters are not uniquely identifiable, then the model will end up dropping some terms. If your experimental design doesn't allow it, you simply can't estimate an interaction, which is necessary for nested terms. For example, you can't tell how the effect of Compo=AB varies among species richnesses, because Compo=AB only exists when species richness is 2. You have a further problem: you can't even estimate the effect of species richness uniquely when Compo is in the model, because Compo is redundant with species richness (once you know the Compo, you know the species richness). Such redundancy is allowed in variance-decomposition approaches, or in Bayesian models, but not in linear models based on parameter estimation and models descended from them. You could make Compo a random effect (also a good modeling strategy because it has many levels, which will be expensive in terms of degrees of freedom); that would still allow you to quantify the effects.
  • Since you have 480 data points, you should keep in mind the rule of thumb (e.g. from Harrell's Regression Modeling Strategies) that you should not try to fit a model with more than at most n/10 = 48 parameters (that's extreme - typically the rule is stated as n/20, which would mean 24 parameters). Because your predictors are all factors, trying to estimate interactions among them will rapidly make your model size blow up.
  • lme4 is known to give false-positive convergence warnings in some cases, but they look real in this case; I think the problem is that you have a model that is way overspecified/too complex ... see if stripping it down (as suggested below) helps.
  • In general you shouldn't include a categorical variable (factor) as both a fixed effect and a random-effect grouping variable: that's a redundant model specification.

I would say a reasonable start for this model would be

lmer(log(WaterChlA) ~ Day*SPrich + (1|Compo) + (1|ExpRun/TankNo),
 data = Wetland, na.action = na.exclude)

although trying to estimate the interaction between Day and SPrich gives you 20 parameters, which is pushing it a bit. You might consider whether you're willing to convert one or both of those factors to numeric (i.e. only looking for linear trends, or if you used poly(Day,SPrich,degree=2) that would give you quadratic terms with only 5 parameters ...)