Solved – When the dependent variable and random effects ‘overlap’ in mixed effects models

lme4-nlmemixed modelphylogenyr

I have added a new example here for clarity, see original question below

Eg. I have 10 schools in 5 countries, ten students from each school is sampled.

Prediction variables: student test marks for Language, Math and Science
Response variable: school fee

I want to know what subject (ie Math) correlates with the schools fees.

lmer(fees~math+language+science+(1|country/school)) *each row is a student

But now I have the same fees for students within the same school, and school is added as a random effect. Is this allowed? Should I just take the average subject marks per school and drop the school random effect? See original question below


I have a dependent variable that depends on one of my random effects, as such:

Dep   R1   R2   X1   X2   X3
30    a    g    4    43   21
30    a    g    7    46   18
20    b    g    5    31   22
20    b    g    4    37   17
60    c    h    9    50   26
60    c    h    7    34   21

lmer(Dep~X1+X2+X3+(1|R2/R1))   (R2=Genus, R1=Species)

I need the random effect, as I have independent data for each specimen, but I know this setup cannot be correct. Plus some of my models fail to converge. I can use the average values of traits for each R1 and then drop the R1 random effect, but then I lose lots of data.

Can I use a linear mixed effects model for this? or should I be using another technique?


I have since decided to use a phylogeny with a PGLS, because taxonomic level random effects are too rough.

At the moment I am looking into pgls.Ives in phytools to account for within species level variation (see Helmus, M. R., Bland, T. J., Williams, C. K., & Ives, A. R. (2007). Phylogenetic measures of biodiversity. The American Naturalist, 169(3)).

Best Answer

I appreciate the school example, but for simplicity I stay with the original example, which was:

lmer(Dep~X1+X2+X3+(1|R2/R1)) (R2=Genus, R1=Species)

You make two comments

  1. I can use the average values of traits for each R1 and then drop the R1 random effect, but then I lose lots of data

  2. Response variable has no variation within species

So, within each group of R1, despite variation in the fixed effects, there is no difference in the response. This may or may not be the reason why you get identifiability problems, in any case you have a very high chance to wrongly attribute variation in the response to either fixed / random effects.

To solve this issue, I would probably go with your comment 1 after all, i.e. averaging trait values. If the response doesn't change there is nothing to be learned from the within-species variability, so you are not loosing information.

However, note that then the averaged X1,X2,X3 are estimates from a distribution, and thus have an error. Error on the predictors can bias regression slopes. You should consider using a method that accounts for error-in-variable, such as a model II regression. I would think the most convenient way to do this is a Bayesian solution, see, e.g. http://mbjoseph.github.io/blog/2013/05/27/typeII/

Addition: if you desperately want to include phylogenetic information on the species-level, you could use a) PGLS (e.g. http://link.springer.com/chapter/10.1007%2F978-3-662-43550-2_5), which accounts for phylogenetic signal in the residuals, or b) some mixed model where phylogenetic distance informs the covariance structure of the random effects. An example of the latter (admittedly not exactly what you want) is Ives, A. R. & Helmus, M. R. (2011) Generalized linear mixed models for phylogenetic analyses of community structure. Ecological Monographs, Ecological Monographs, 81, 511-525.

Related Question