Mixed Models – Intuition on Parameter Estimation: Variance Parameters vs. Conditional Modes

blupintuitionmixed model

I have read many times that random effects (BLUPs/conditional modes for, say, subjects) are not parameters of a linear mixed effects model but instead can be derived from the estimated variance/covariance parameters. E.g. Reinhold Kliegl et al. (2011) state:

Random effects are subjects’ deviations from the grand mean RT and
subjects’ deviations from the fixed-effect parameters. They are
assumed to be independently and normally distributed with a mean of 0.
It is important to recognize that these random effects are not
parameters of the LMM – only their variances and covariances are.
[…] LMM parameters in combination with subjects’ data can be used to
generate “predictions” (conditional modes) of random effects for each
subject.

Can someone give an intuitive explanation how the (co)variance parameters of the random effects can be estimated without actually using/estimating the random effects?

Best Answer

Consider a simple linear mixed model, e.g. a random intercept model where we estimate the dependency of $y$ on $x$ in different subjects, and assume that each subject has their own random intercept:$$y = a + bx + c_i + \epsilon.$$ Here intercepts $c_i$ are modeled as coming from a Gaussian distribution $$c_i\sim \mathcal N(0, \tau^2)$$ and random noise is also Gaussian $$\epsilon \sim \mathcal N(0, \sigma^2).$$ In the lme4 syntax this model would be written as y ~ x + (1|subject).

It is instructive to rewrite the above as follows:

\begin{gather} y \mid c \sim \mathcal N(a + bx + c, \sigma^2) \\ c \sim \mathcal N(0, \tau^2) \end{gather}

This is a more formal way to specify the same probabilistic model. From this formulation we can directly see that the random effects $c_i$ are not "parameters": they are unobserved random variables. So how can we estimate the variance parameters without knowing the values of $c$?

Note that the first equation above describes the conditional distribution of $y$ given $c$. If we know the distribution of $c$ and of $y\mid c$, then we can work out the unconditional distribution of $y$ by integrating over $c$. You might know it as the Law of total probability. If both distributions are Gaussian, then the resulting unconditional distribution is also Gaussian.

In this case the unconditional distribution is simply $\mathcal N(a + bx, \sigma^2+\tau^2)$, but our observations are not i.i.d. samples from it because there are multiple measurements per subject. In order to proceed, we need to consider the distribution of the whole $n$-dimensional vector $\mathbf y$ of all observations: $$\mathbf y \sim \mathcal N(a+b\mathbf x, \boldsymbol\Sigma)$$ where $\boldsymbol\Sigma=\sigma^2 \mathbf I_n + \tau^2 \mathbf I_N \otimes \mathbf 1_M$ is a block-diagonal matrix composed of $\sigma^2$ and $\tau^2$. You asked for intuition so I want to avoid the math. The important point is that this equation does not have $c$ anymore! This is what one actually fits to the observed data, and that is why one says that $c_i$ are not the parameters of the model.

When the parameters $a$, $b$, $\tau^2$, and $\sigma^2$ are fit, one can work out the conditional distribution of $c_i$ for each $i$. What you see in the mixed model output are the modes of these distributions, aka the conditional modes.

Related Solutions

Solved – Calculated breeding values using markers using animal model in R

The mixed.solve or kin.blup functions from R package rrBLUP can do the type of analysis you are talking. In addition to reference manual in CRAN, the official website consists of vignettes. Here is an example from the package:

require(rrBLUP)

#random population of 200 individuals/lines with 1000 markers
M <- matrix(rep(0,200*1000),200,1000)
for (i in 1:200) {
  M[i,] <- ifelse(runif(1000)<0.5,-1,1)
}

#random phenotypes
u <- rnorm(1000)
g <- as.vector(crossprod(t(M),u))
h2 <- 0.5  #heritability
y <- g + rnorm(200,mean=0,sd=sqrt((1-h2)/h2*var(g)))


#predict breeding values, here we are using MM' as relatedness measure 
# using function A.mat 
ans <- mixed.solve(y,K=A.mat(M), method="REML", bounds=c(1e-09, 1e+09))
# y is vector of y observations, K is kinship or relatedness matrix
# the defualt method is REML, the bounds specifies the upper and lower bound for 
# ridge parameter 

# BLUP estimation 
ans$u 

# if you are interested in prediction marker effects 
ans <- mixed.solve(y,Z=M)  #By default K = I

# MM' matrix, measure of relatedness, use function A.mat  
A <- A.mat(M)

# The kin.blup is wrapper function for mixed.solve, can also be used.

Mixed Model – Interpreting BLUPs or VarCorr Estimates in Mixed Models Explained

As explained in the answer you cited above, the covariance matrices are referring to two different models, one in the marginal model (integrating out the random effects), and the other on the conditional model on the random effects.

It is not that one is better than the other because they are not referring to the same model. Which one you select depends on your question of interest.

Regarding your second question, you have to be a bit more specific on what you exactly you mean by "BLUPs" under the two models. For example, the empirical Bayes estimates of the random effects are derived using the same idea under the two approaches, i.e., the mode of the conditional distribution of the random effects given the observed outcomes. You can add these to the fixed-effects part to obtain subject-specific predictions, taking into account though that in the case of glmer() you also have a link function.

EDIT: Regarding the two models mentioned above; say, $y$ is your outcome variable and $b$ are the random effects. A general definition of a mixed model is: $$\left\{ \begin{array}{l} y \mid b \sim \mathcal F_y(\theta_y),\\\\ b \sim \mathcal N(0, D), \end{array} \right.$$ where $\mathcal F_y(\theta_y)$ is an appropriate distribution for the outcome $y$, e.g., it could be Gaussian (in which case you obtain a linear mixed model), Binomial (and you obtain a mixed effects logistic regression), Poisson (mixed effects Poisson regression), etc.

The random effects are estimated as the modes of the posterior distribution $$ p(b \mid y) = \frac{p(y \mid b) \; p(b)}{p(y)}, $$ where $p(y \mid b)$ is the probability density or probability mass function behind $\mathcal F_y$, and $p(b)$ the probability density function of the multivariate normal distribution for the random effects.

With regard to your question, the covariance matrix of the empirical Bayes estimates obtained from ranef() is related to the covariance of this posterior distribution, whereas VarCorr() is giving the $D$ matrix, which is the covariance matrix of the prior distribution of the random effects. These are not the same.

EDIT 2: A relevant feature of the estimation of the random effects is shrinkage. That is, the estimates of the random effects are shrunk towards the overall mean of the model. The degree of shrinkage depends on

The relative magnitude of the variance of the random effects versus the variance of the error terms. I.e., the larger the variance of the random effects with respect to the error variance, the smaller the degree of shrinkage.
The number of repeated measurements. The more repeated measurements, the smaller the degree of shrinkage.

The following code illustrates this in the simple random-intercepts model:

prior_vs_post <- function (prior_variance = 1, error_variance = 1, 
                           repeated_meas_per_id = 5) {
    require("lme4")
    n <- 1000 # number of subjects
    beta <- 10 # fixed effect intercept
    b <- rnorm(n, 0, sqrt(prior_variance)) # random intercepts
    DF <- data.frame(id = rep(seq_len(n), each = repeated_meas_per_id))
    # simulate normal data conditional on the random intercepts
    DF$y <- rnorm(n * repeated_meas_per_id, beta + b[DF$id], sqrt(error_variance))
    ###############
    # Fit the model
    fm <- lmer(y ~ 1 + (1 | id), data = DF)
    c(estimated_prior_variance = VarCorr(fm)$id[1L],
      BLUPs_variance = var(ranef(fm)$id[[1L]]))
}

# high variance of REs, low variance error terms
# 2 repeated measurements => low shrinkage
prior_vs_post(prior_variance = 10, error_variance = 1, 
              repeated_meas_per_id = 2)
#> estimated_prior_variance           BLUPs_variance 
#>                 11.05215                 10.58501

# high variance of REs, low variance error terms
# 20 repeated measurements => almost no shrinkage
prior_vs_post(prior_variance = 10, error_variance = 1, 
              repeated_meas_per_id = 20)
#> estimated_prior_variance           BLUPs_variance 
#>                 10.07539                 10.02580

# low variance REs, high variance error terms,
# 20 repeated measurements => considerable shrinkage
prior_vs_post(prior_variance = 1, error_variance = 10, 
              repeated_meas_per_id = 20)
#> estimated_prior_variance           BLUPs_variance 
#>                1.0002202                0.6666536

# low variance REs, high variance error terms,
# 2 repeated measurements => extreme shrinkage
prior_vs_post(prior_variance = 1, error_variance = 10, 
              repeated_meas_per_id = 2)
#> estimated_prior_variance           BLUPs_variance 
#>                0.9479291                0.1574824

Best Answer

Related Solutions

Solved – Calculated breeding values using markers using animal model in R

Mixed Model – Interpreting BLUPs or VarCorr Estimates in Mixed Models Explained

Related Question