Suppose I have measured some variable in siblings, which are nested within families. The data structure looks like this:
family sibling value ------ ------- ----- 1 1 y_11 1 2 y_12 2 1 y_21 2 2 y_22 2 3 y_23 ... ... ...
I want to know the correlation between measurements taken on siblings within the same family. The usual way of doing that is to calculate the ICC based on a random-intercept model:
res <- lme(yij ~ 1, random = ~ 1 | family, data=dat)
getVarCov(res)[[1]] / (getVarCov(res)[[1]] + res$s^2)
This would be equivalent to:
res <- gls(yij ~ 1, correlation = corCompSymm(form = ~ 1 | family), data=dat)
except that the latter approach also allows for a negative ICC.
Now suppose I have measured three items in siblings nested within families. So, the data structure looks like this:
family sibling item value ------ ------- ---- ----- 1 1 1 y_111 1 1 2 y_112 1 1 3 y_113 1 2 1 y_121 1 2 2 y_122 1 2 3 y_123 2 1 1 y_211 2 1 2 y_212 2 1 3 y_213 2 2 1 y_221 2 2 2 y_222 2 2 3 y_223 2 3 1 y_231 2 3 2 y_232 2 3 3 y_233 ... ... ... ...
Now, I want to find out about:
- the correlation between measurements taken on siblings within the same family for the same item
- the correlation between measurements taken on siblings within the same family for different items
If I only had pairs of siblings within families, I would just do:
res <- gls(yijk ~ item, correlation = corSymm(form = ~ 1 | family),
weights = varIdent(form = ~ 1 | item), data=dat)
which gives me a $6 \times 6$ var-cov matrix on the residuals of the form:
$\left[\begin{array}{ccc|ccc}
\sigma^2_1 & \rho_{12} \sigma_1 \sigma_2 & \rho_{13} \sigma_1 \sigma_3 & \phi_{11} \sigma^2_1 & \phi_{12} \sigma_1 \sigma_2 & \phi_{13} \sigma_1 \sigma_3 \\
& \sigma^2_2 & \rho_{23} \sigma_2 \sigma_3 & & \phi_{22} \sigma^2_2 & \phi_{23} \sigma_2 \sigma_3 \\
& & \sigma^2_3 & & & \phi_{33} \sigma^2_3 \\ \hline
& & & \sigma^2_1 & \rho_{12} \sigma_1 \sigma_2 & \rho_{13} \sigma_1 \sigma_3 \\
& & & & \sigma^2_2 & \rho_{23} \sigma_2 \sigma_3 \\
& & & & & \sigma^2_3 \\
\end{array}\right]$
based on which I could easily estimate those cross-sibling correlations (the $\phi_{jj}$ values are the ICCs for the same item; the $\phi_{jj'}$ values are the ICCs for different items). However, as shown above, for some families, I have only two siblings, but for other families more than two. So, that makes me think that I need to get back to a variance-components type of model. However, the correlation between items may be negative, so I do not want to use a model that constraints the correlations to be positive.
Any ideas/suggestions of how I could approach this? Thanks in advance for any help!
Best Answer
The package MCMCglmm can easily handle and estimate covariance structures and random effects. However it does use bayesian statistics which can be intimidating to new users. See the MCMCglmm Course Notes for a thorough guide to MCMCglmm, and chapter 5 in particular for this question. I absolutely recommend reading up on assessing model convergence and chain mixing before analysing data for real in MCMCglmm.
MCMCglmm uses priors, this is an uninformative inverse wishart prior.
Fit the model
In the model summary
summary(m)
the G structure describes the variance and covariance of the random intercepts. The R structure describes the observation level variance and covariance of intercept, which function as residuals in MCMCglmm.If you are of a Bayesian persuasion you can get the entire posterior distribution of the co/variance terms
m$VCV
. Note that these are variances after accounting for the fixed effects.simulate data
Estimating the original co/variance of the random effects requires a large number of levels to the random effect. Instead your model will likely estimate the observed co/variances which can be calculated by
cov(group_var)