Solved – way to run a multivariate mixed effects model (e.g. RDA) with an unbalanced number of observations per level of the random factor

mixed modelmultivariate analysispermutation-test

I have a community matrix (fish gut microbiome) from a laboratory experiment consisting of 2 treatments, each with 3 independent aquaria, which were sampled 3 times. At each sampling event 6 fish were collected for microbiome sequencing. Since the fish that were sampled from the same tank are not independent from each other, my plan was to use a mixed effects model to test for the effects of treatment and time on microbiome composition with aquarium as random factor, e.g. Y ~ treatment * time + (1|aquarium).

However, while the original sampling design was balanced, it was not possible to obtain data from all sampled fish. Therefore, the number of fish per sampling event and aquarium ranges between 4 – 6. While this does not seem to be a problem in R for a univariate analysis (e.g. using nlme::lme), I could not find a solution for a multivariate response variable.

In the R package vegan, restricted permutation tests for e.g. RDA and PERMANOVA are implemented to account for a dependence of observations by defining which samples will be permuted using the permute::how command. However this only works for balanced data sets, i.e. the same number of observations for each level of the random factor (here: aquaria).

The only option I am aware of to solve this problem, which I would really like avoid, is to delete observation to achieve a balanced design again. Do you know of any other method that also works with unbalanaced data? I welcome any ideas of how to solve this issue.

Thanks!

Best Answer

I only know of one package capable of fitting MV mixed effect regressions; asreml-r, which is a paid package and relatively expensive - I am pretty sure that vegan::adonis2 or the permanova functions are not capable of this; they only do nesting, which is not the same. You could include aquarium as a fixed term to see how much variation actually exists between aquarium though.

The differences in sample size aren't too extraordinary. However, if you are really worried about them you can use multiple imputation (here is a tutorial to do it in R) to generate the missing observations.

What kind of community matrix are you dealing with? Presence-Absence?

Related Question