Mixed Models – Minimum Repeated Measures and Levels per Nested Random Effect

lme4-nlmemixed modelnested datarandom-effects-modelrepeated measures

I often read the guideline that a random factor should at least have 5-6 levels. However, it is not yet really clear to me if there is (i) a minimum number of levels for a nested factor within a block and (ii) whether there is a minimum of measurements per individual.
For instance, I have a BACI (Before After Control Impact) model that is specified like this in the lme4 package in R:

lmer(y ~ treatment * period + (1| site/block/subject), data = mydata)

I have

  • 2 treatment groups
  • 2 periods (before and after exposure to the treatment/placebo)
  • 64 subjects with 2 measurements (before and after exposure)
  • 16 blocks (8 per treatment)
  • 8 sites (with one treated and one control block).

This means, I have only

  • 2 repeated measures per subject &
  • 2 blocks per site.

Does the small number of blocks and repeated measures present a problem when the total number of blocks (16) and subjects (64) is large or more generally:

  1. Q1: Is there a minimum number of levels of a nested random factor within the factor where it is nested in?

  2. Q2: Is there a minimum number of repeated measures per subject?


My personal (layman’s) opinion:

  • I personally believe that the small number of blocks per site does not represent a problem, because this book chapter on models with multiple random-effects shows an example with a random factor with 30 levels (samples), but only 3 within each block (batch). I find this also intuitive because I imagine there are still 30 (and not just 3) values to estimate the distribution of the effect (even if it has to be estimated in reference to the batch). That’s however just my imagination, I have very little understanding on how it actually works. In addition, this article advocates the maximal random effects structure justified by the design.

  • Similarly I would believe that it is okay to specify a random factor with only two data points per subject (but multiple levels).

However, I have no real understanding of this and a colleague who gives statistical courses told me not to. Since, I have never read a guideline about this, I am asking here.

Best Answer

I agree with your reasoning, but it makes it easier to think about when we remember that:

(1| site/block/subject)

is the same as

(1| site) + (1|site:block) + (1|site:block:subject)

So, the limiting number of levels for each factor only applies to the "top" level - that is, site in this case. Here we have 8, sites, so that is OK.

Obviously regardless of how many levels we have for block and subject, the other two grouping terms will have more than 8 levels, so all is good here.