I am trying to fit a mixed model with about 45 groups, about 10 of the groups have just one observation and about 10 groups have more than 5 observations. The total number of observations is around 170. Should I be concerned about the estimates? Is there merit in just removing the groups with one observations or somehow merging them with the other ones? Are there techniques for that?
Mixed Model – Determining the Number of Observations Per Group in Linear Mixed Models
linearmixed model
Related Solutions
No, it will not be a strong influence. Using the standard LME model where $y \sim N(X\beta, ZDZ^T + \sigma^2 I)$ if one assumes a degenerate case for an LME where you have an equal number of observations and groups (let's say under a "simple" clustering, no crossed or nested effects etc.) then all your sample variance would moved in the $D$ matrix, and $\sigma^2$ should be zero. The problem will be that you will have as many parameters as data in a liner model. You have an over-parametrized model; therefore regression will a bit nonsensical. Issues of identifiability will also arise.
Luckily you are not in this case. That means that in most cases you can achieve separation of variance as you have "enough realizations" from each group. I would suggest trying to fit your model with and without the single-observation groups; you should see negligible difference in the estimated variance parameters. If not question what is going on to the single-observation groups. Are they sensible? What caused a single observation to be retained (machine failure? difficult of measurement? rarity? etc.)
In general single-observation groups tend to be a bit messy; to quote D.Bates from the r-sig-mixed-models mailing list:
I think you will find that there is very little difference in the model fits whether you include or exclude the single-observation groups. Try it and see.
(What I am commenting on are LME models, for GLME models the concept of over-dispersion comes into play and then single observation groups are not "as problematic as" in an LME model.)
In general, you have an issue with identifiability. Linear models with a random effect assigned to a parameter with only one measurement can't distinguish between the random effect and the residual error.
A typical linear mixed effect equation will look like:
$E = \beta + \eta_i + \epsilon_j$
Where $\beta$ is the fixed effect, $\eta_i$ is the random effect for level $i$, and $\epsilon_j$ is the residual variability for the $j$th measurement. When you have only one observation of a level with a random effect, it is difficult to distinguish between $\eta$ and $\epsilon$. You will (typically) be fitting a variance or standard deviation to $\eta$ and $\epsilon$, so with only one measure per individual, you will be not be as certain that you have an accurate estimate for $SD(\eta)$ and $SD(\epsilon)$, but the estimate of the sum of the variances ($var(\eta) + var(\epsilon)$) should be relatively robust.
On to the practical answer: If you have about 1/3 of your observations with a single observation per individual, you are probably OK overall. The rest of the population should provide a reasonable estimate for $SD(\eta)$ and $SD(\epsilon)$, and these individuals should be minor contributors overall. On the other hand, if you have all individuals at a specific fixed effect and random effect with a single measure (e.g. for your example, perhaps all of a population-- perhaps that means species for you), then you would trust the result less.
Best Answer
This sample size is rather small for Linear Mixed Model. Hox (2002) quotes Kreft's (1996) rules of thumb for minimal sample size:
Ten observations per group is minimal sample size that is often mentioned. The general problem is that 1 or even 5 is a very small sample for a group. Also your general sample size of 170 is rather small for estimating as complicated model as multilevel model. The smaller the sample, the more biased your results can get.
As an alternative, you can use Bayesian estimation, since it often works well even with small sample sizes. However, with this approach you could end up with estimates being drawn purely from prior distribution.
You can find more information on preferable sample sizes and power analysis for multilevel modeling in those two books: