I am a beginner on learning in GLMM and R, please forgive me if I am not making sense or asking something that is very basic.
I am trying to specify a nested fixed-factor in my GLMM model, but I don't seem to find the way to do it. I would really appreciate if someone can answer this question.
My experiment consists of four different species (A,B,C,D), and I have used all possible combinations of the four species as different treatments (composition, Compo
, each Compo
n = 4 in each ExpRun
). Compo
is therefore nested within SPrich
, because Compo
consisting only species A will only be possible under SPrich
of 1, but not others. Therefore my data is not balanced under factor SPrich
. For example:
SPrich Compo(abbr.)
0 Control
1 A
1 B
1 C
1 D
2 AB
2 AC
2 AD
2 BC
2 BD
2 CD
3 ABC
...
4 ABCD
I am interested to know whether the species richness SPrich
(five levels) and composition Compo
(16 levels) has an effect on the response variable WaterChlA
, which I measured on different Day
from each of my experimental tanks TankNo
. WaterChlA
is not normally distributed, but more closely resemble a lognormal distribution, so gaussian link log is used. Samples from each TankNo
are taken between Day
, thus they are not independent. I have conducted the experiment for three times, denoted in factor ExpRun
. TankNo
are denoted differently in all three ExpRun
and therefore a nested random factor.
Here is a section of my data to demonstrate:
> str(Wetland)
'data.frame': 480 obs. of 31 variables:
$ TankNo : Factor w/ 150 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ ExpRun : Factor w/ 3 levels "1","2","3": 2 2 2 2 2 2 2 2 2 2 ...
$ Day : Factor w/ 4 levels "1","8","15","22": 1 1 1 1 1 1 1 1 1 1 ...
$ SPrich : Ord.factor w/ 5 levels "0"<"1"<"2"<"3"<..: 1 1 1 1 2 2 2 2 2 2 ...
$ FxRich : Ord.factor w/ 3 levels "0"<"1"<"2": 1 1 1 1 2 2 2 2 2 2 ...
$ Compo : Factor w/ 16 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ...
$ MIFI : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 2 2 1 1 ...
$ KAPU : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 2 2 ...
$ DUME : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ POME : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ WaterChlA : num 8.33 1.99 35.56 17.49 22.04 ...
The question comes as my fixed-factor Compo
should be nested within SPrich
but it seems that it is more common to nest random factors in glmm, and I have not encounter on how to nest fixed-factors. I read in a Github post by Prof Bolker that specifying A/B
in R actually equals to 1 + A + A:B
, but I wonder if writing that would be enough to specify the nestedness between SPrich
and Compo
(as below)? If not, what should be the correct way to specify that Compo
is nested within SPrich
?
I input the following into the model to run glmer
with package lme4
:
glmm1 <- glmer(WaterChlA ~ ExpRun + Day + SPrich/Compo + ExpRun*Day + ExpRun*SPrich +
ExpRun*SPrich/Compo + Day*SPrich + Day*SPrich/Compo + (1|ExpRun/TankNo), data = Wetland,
family = gaussian(link = "log"), na.action = na.exclude)
It said:
fixed-effect model matrix is rank deficient so dropping 404 columns / coefficients
which is understandable because my fixed-factors are not full-rank but nested, so I am not too surprised if it has to drop the non-existing combinations of coefficients. However, it sort of demonstrates that the nestedness isn't really put into consideration by the model specified above(?). It also failed to converge:
Warning messages:
1: In (function (fn, par, lower = rep.int(-Inf, n), upper = rep.int(Inf, :
failure to converge in 10000 evaluations
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.112084 (tol = 0.001, component 1)
I want to know whether the way I specify the nested Compo
in SPrich
is correct. I also wonder if GLMM is the most suitable way to analyze this type of data, or is there any other more suitable tools for this, perhaps GLM would be sufficient to analyze my data? I go for GLMM because there is quite a large variance among different Tanks and thus masking the effect of factors that are of interests, and treating TankNo
as a random factor should help lower the influence.
Other than SPrich
, I'm also interested to look at how functional richness FxRich
affects the response, but Compo
is nested within FxRich
differently than within SPrich
(say Compo
A & B is same functional group and C & D is another, then Compo
AB will be SPrich
2 and FxRich
1, while Compo
AC will be SPrich
2 and FxRich
2, thus the nesting is quite different and complicated), which I think is more sensible to resolve later in another model that do not include SPrich
or Compo
. I wonder if this is reasonable? Perhaps this should go into another question thread and be discussed later…
Thank you all for you attention! Really appreciate any thoughts on the above!
Best Answer
Stream of consciousness:
lmer(log(WaterChlA) ~ ...)
rather thanglmer(WaterChla ~ ..., family=gaussian(link="log"))
; I say this because log-transforming can take care of heteroscedasticity in the response (specifically a standard deviation approximately proportional to the expected mean value), which the log-link approach doesn't do (also,lmer
tends to be a little bit faster and more stable thanglmer
)Compo=AB
varies among species richnesses, becauseCompo=AB
only exists when species richness is 2. You have a further problem: you can't even estimate the effect of species richness uniquely whenCompo
is in the model, becauseCompo
is redundant with species richness (once you know theCompo
, you know the species richness). Such redundancy is allowed in variance-decomposition approaches, or in Bayesian models, but not in linear models based on parameter estimation and models descended from them. You could makeCompo
a random effect (also a good modeling strategy because it has many levels, which will be expensive in terms of degrees of freedom); that would still allow you to quantify the effects.lme4
is known to give false-positive convergence warnings in some cases, but they look real in this case; I think the problem is that you have a model that is way overspecified/too complex ... see if stripping it down (as suggested below) helps.I would say a reasonable start for this model would be
although trying to estimate the interaction between
Day
andSPrich
gives you 20 parameters, which is pushing it a bit. You might consider whether you're willing to convert one or both of those factors to numeric (i.e. only looking for linear trends, or if you usedpoly(Day,SPrich,degree=2)
that would give you quadratic terms with only 5 parameters ...)