Solved – Number of observations in groups – linear mixed effects model

lme4-nlmemixed modelr

I would like to fit linear mixed effects model to my dataset, but I was wondering if quantity of observations in groups matter? I have some groups with about 60 observations in each, but there are also some with only 1, so Im curious if there will be any influence on my linear mixed effects model because of that.

Thanks in advance

Best Answer

No, it will not be a strong influence. Using the standard LME model where $y \sim N(X\beta, ZDZ^T + \sigma^2 I)$ if one assumes a degenerate case for an LME where you have an equal number of observations and groups (let's say under a "simple" clustering, no crossed or nested effects etc.) then all your sample variance would moved in the $D$ matrix, and $\sigma^2$ should be zero. The problem will be that you will have as many parameters as data in a liner model. You have an over-parametrized model; therefore regression will a bit nonsensical. Issues of identifiability will also arise.

Luckily you are not in this case. That means that in most cases you can achieve separation of variance as you have "enough realizations" from each group. I would suggest trying to fit your model with and without the single-observation groups; you should see negligible difference in the estimated variance parameters. If not question what is going on to the single-observation groups. Are they sensible? What caused a single observation to be retained (machine failure? difficult of measurement? rarity? etc.)

In general single-observation groups tend to be a bit messy; to quote D.Bates from the r-sig-mixed-models mailing list:

I think you will find that there is very little difference in the model fits whether you include or exclude the single-observation groups. Try it and see.

(What I am commenting on are LME models, for GLME models the concept of over-dispersion comes into play and then single observation groups are not "as problematic as" in an LME model.)