Solved – Intraclass Correlation Coefficients (ICC) with Multiple Variables

intraclass-correlationmixed model

Suppose I have measured some variable in siblings, which are nested within families. The data structure looks like this:

family sibling value
------ ------- -----
1      1       y_11
1      2       y_12
2      1       y_21
2      2       y_22
2      3       y_23
...    ...     ...

I want to know the correlation between measurements taken on siblings within the same family. The usual way of doing that is to calculate the ICC based on a random-intercept model:

res <- lme(yij ~ 1, random = ~ 1 | family, data=dat)
getVarCov(res)[[1]] / (getVarCov(res)[[1]] + res$s^2)

This would be equivalent to:

res <- gls(yij ~ 1, correlation = corCompSymm(form = ~ 1 | family), data=dat)

except that the latter approach also allows for a negative ICC.

Now suppose I have measured three items in siblings nested within families. So, the data structure looks like this:

family sibling item value
------ ------- ---- -----
1      1       1    y_111
1      1       2    y_112
1      1       3    y_113
1      2       1    y_121
1      2       2    y_122
1      2       3    y_123
2      1       1    y_211
2      1       2    y_212
2      1       3    y_213
2      2       1    y_221
2      2       2    y_222
2      2       3    y_223
2      3       1    y_231
2      3       2    y_232
2      3       3    y_233
...    ...     ...  ...

Now, I want to find out about:

  1. the correlation between measurements taken on siblings within the same family for the same item
  2. the correlation between measurements taken on siblings within the same family for different items

If I only had pairs of siblings within families, I would just do:

res <- gls(yijk ~ item, correlation = corSymm(form = ~ 1 | family), 
           weights = varIdent(form = ~ 1 | item), data=dat)

which gives me a $6 \times 6$ var-cov matrix on the residuals of the form:

$\left[\begin{array}{ccc|ccc}
\sigma^2_1 & \rho_{12} \sigma_1 \sigma_2 & \rho_{13} \sigma_1 \sigma_3 & \phi_{11} \sigma^2_1 & \phi_{12} \sigma_1 \sigma_2 & \phi_{13} \sigma_1 \sigma_3 \\
& \sigma^2_2 & \rho_{23} \sigma_2 \sigma_3 & & \phi_{22} \sigma^2_2 & \phi_{23} \sigma_2 \sigma_3 \\
& & \sigma^2_3 & & & \phi_{33} \sigma^2_3 \\ \hline
& & & \sigma^2_1 & \rho_{12} \sigma_1 \sigma_2 & \rho_{13} \sigma_1 \sigma_3 \\
& & & & \sigma^2_2 & \rho_{23} \sigma_2 \sigma_3 \\
& & & & & \sigma^2_3 \\
\end{array}\right]$

based on which I could easily estimate those cross-sibling correlations (the $\phi_{jj}$ values are the ICCs for the same item; the $\phi_{jj'}$ values are the ICCs for different items). However, as shown above, for some families, I have only two siblings, but for other families more than two. So, that makes me think that I need to get back to a variance-components type of model. However, the correlation between items may be negative, so I do not want to use a model that constraints the correlations to be positive.

Any ideas/suggestions of how I could approach this? Thanks in advance for any help!

Best Answer

The package MCMCglmm can easily handle and estimate covariance structures and random effects. However it does use bayesian statistics which can be intimidating to new users. See the MCMCglmm Course Notes for a thorough guide to MCMCglmm, and chapter 5 in particular for this question. I absolutely recommend reading up on assessing model convergence and chain mixing before analysing data for real in MCMCglmm.

library(MCMCglmm)

MCMCglmm uses priors, this is an uninformative inverse wishart prior.

p<-list(G=list(
  G1=list(V=diag(2),nu=0.002)),
R=list(V=diag(2),nu=0.002))

Fit the model

m<-MCMCglmm(cbind(x,y)~trait-1,
#trait-1 gives each variable a separate intercept
        random=~us(trait):group,
#the random effect has a separate intercept for each variable but allows and estiamtes the covariance between them.
        rcov=~us(trait):units,
#Allows separate residual variance for each trait and estimates the covariance between them
        family=c("gaussian","gaussian"),prior=p,data=df)

In the model summary summary(m) the G structure describes the variance and covariance of the random intercepts. The R structure describes the observation level variance and covariance of intercept, which function as residuals in MCMCglmm.

If you are of a Bayesian persuasion you can get the entire posterior distribution of the co/variance terms m$VCV. Note that these are variances after accounting for the fixed effects.

simulate data

library(MASS)
n<-3000

#draws from a bivariate distribution
df<-data.frame(mvrnorm(n,mu=c(10,20),#the intercepts of x and y
                   Sigma=matrix(c(10,-3,-3,2),ncol=2)))
#the residual variance covariance of x and y


#assign random effect value
number_of_groups<-100
df$group<-rep(1:number_of_groups,length.out=n)
group_var<-data.frame(mvrnorm(number_of_groups, mu=c(0,0),Sigma=matrix(c(3,2,2,5),ncol=2)))
#the variance covariance matrix of the random effects. c(variance of x,
#covariance of x and y,covariance of x and y, variance of y)

#the variables x and y are the sum of the draws from the bivariate distribution and the random effect
df$x<-df$X1+group_var[df$group,1]
df$y<-df$X2+group_var[df$group,2]

Estimating the original co/variance of the random effects requires a large number of levels to the random effect. Instead your model will likely estimate the observed co/variances which can be calculated by cov(group_var)