I'll answer each of your questions one at a time.
May I ask if the following model is a random-intercept model?
1. There is a common beta for all N individuals.
2. Each group has a different within group regression line (same slope but different intercepts).
3. The regression line within each group crosses the "cloud" consisting of the group members. And the individual residuals scatter around the regression line, within each group.
Conditions (2) and (3) are in conflict with each other. If each group has the same slope but different intercepts then the within group regression line will not
pass through the cloud of group observations unless the truth is that every group has the exact same slope. You would need both a random intercept and a random slope in every predictor to guarantee that condition (3) is satisfied.
However, how do I explicitly write out the equation?
The familiar formula for the random effects model, as you pointed out, is
$$ {\bf Y}_i = {\bf X}_i {\boldsymbol \beta} + {\bf Z}_{i} {\bf b}_i + {\boldsymbol \varepsilon}_{i} $$
Where $${\bf Y}_i = \left( \begin{array}{c} y_{i1} \\ y_{i2} \\ \vdots \\ y_{i n_{i}} \end{array} \right) $$ is the vector of responses in group $i$, $n_{i}$ is the number of observations in group $i$ $(n_i$ can be $1$ for some groups but not all groups $)$.
$${\bf X}_i = \left( \begin{array}{ccccc}
1 & x_{i11} & x_{i12} & \cdots & x_{i1p} \\
1 & x_{i21} & x_{i22} & \cdots & x_{i2p} \\
\vdots & \vdots & \vdots & \vdots & \vdots \\
1 & x_{i n_{i} 1} & x_{i n_{i} 2} & \cdots & x_{i n_{i} p} \\
\end{array} \right) $$
is the matrix of the $p$ predictor variables for each observation in group $i$ with corresponding $p$-length fixed effects regression coefficient vector ${\boldsymbol \beta}$. $${\bf b}_i = \left( \begin{array}{c} b_{i0} \\ b_{i1} \\ \vdots \\ b_{im} \end{array} \right) $$ is the $m$ length vector the vector of random effects and $${\bf Z}_i = \left( \begin{array}{cccc}
z_{i11} & z_{i12} & \cdots & z_{i1m} \\
z_{i21} & z_{i22} & \cdots & z_{i2m} \\
\vdots & \vdots & \vdots & \vdots \\
z_{in_{i} 1} & z_{i n_{i} 2} & \cdots & z_{i n_{i} m} \\
\end{array} \right) $$ be the random effects design matrix for group $i$ and
$$ {\boldsymbol \varepsilon}_i = \left( \begin{array}{c} \varepsilon_{i1} \\ \varepsilon_{i2} \\ \vdots \\ \varepsilon_{i n_i} \end{array} \right) $$ is the vector of errors. So, for example if you just had a random intercept and a random slope in the first predictor, then
$${\bf b}_i = \left( \begin{array}{c} b_{i0} \\ b_{i1} \end{array} \right),
{\bf Z}_i = \left( \begin{array}{cc}
1 & x_{i11} \\
1 & x_{i21} \\
\vdots & \vdots \\
1 & x_{i n_{i} 2} \\
\end{array} \right) $$
where $b_{i0}$ is the random intercept and $b_{i1}$ is the random slope. If you only had a random intercept and nothing else then ${\bf b}_i$ would be a scalar and ${\bf Z}_{i}$ would just be a vector of $1$s.
In your particular example, you have a categorical predictor (say with, $k$ levels), which, for modeling, is effectively like having $k-1$ dummy variables which are $1$ if variable takes on that value and $0$ otherwise. So, your ${\bf X}_{i}$ matrix will have $k+2$ columns - one column of $1$s, two columns with the values of the quantitative predictors, and $k-1$ columns with that are $0/1$ indicators of which level the categorical predictor takes. If you are going to include random slopes in every predictor, then ${\bf Z}_{i}$ will be exactly the same as ${\bf X}_{i}$. As mentioned above, if you only want a random intercept in the model then ${\bf Z}_{i}$ is just a column of $1$s.
how do I set up the weights in LME in R?
This depends on what you mean by "weights". Usually this means that certain observations are weighted (e.g. inverse-probability weights to correct for unequal probability sampling) and the criterion function that is being optimized to produce your estimates (probably the likelihood function in this case) is a weighted. For example, if the groups were sampled with unequal probability the function
$$ {\bf L} = \sum_{i=1}^{K} w_i L_i $$
where $L_i$ is the group $i$ log-likelihood may be optimized instead of the the unweighted sum. In terms of point estimation, this is effectively the same as as replicating group $i$ in the data set a number of times equal to $w_i$, which can be accomplished by doing exactly that - expand the data set based on the weights and fit the model to this expanded data set. I'm not sure if there is functionality in lme
to do this automatically, so you may need to do this yourself.
Regarding weighting within a group (i.e. at the individual level), I do not recommend this in the context of random effects modeling. To see why, consider the fact that by weighting within a cluster, you're effectively creating exact copies of individuals within the cluster. Therefore, there will be pockets within the group that are perfectly correlated with each other, so the estimates of the random effect variances will be biased up, since the model will think that members of a group are more correlated than they are.
Comments related to the Edit:
The only way adding a random intercept would make it so that each group's regression line passes through the "cloud" is if each group's "cloud" was just a vertical shift of each other's group - that is, the slopes are exactly the same but the intercepts are different. More generally, the linear least squares line requires both a slope and an intercept. If you don't let the slopes vary,the random intercept will go wherever is "best" (i.e. maximizes the posterior mode, if you're trying to estimate random effects),so I don't think it could possibly appear all the way on one side or another of the "cloud".
The central point within each group would presumably be the sample means, within the group, although this would require some more thought, as would the comments made in (1), since we aren't fitting this model by least squares (although it is closely related to least squares, since it involves the Gaussian likelihood).
The only way I can see to visualize this is to plot the points for a given group, along with the fitted line within the group, the same way you would with ordinary regression. You can extract posterior estimates of the random effects using the ranef
function with the argument being the lmer
model fit and you can extract the fixed effects in the usual way.
Best Answer
With 16000 genes you will be better off using software packages that are designed to handle large-scale gene-expression data. The DESeq2 and edgeR Bioconductor packages were designed for working with RNAseq data, and the venerable limma package originally designed for spotted microarrays can work with such data, too.
Your gene-by-gene analysis does not model variance as a function of gene expression level, unlike the packages cited above. For example, the DESeq2 package performs negative binomial modeling of RNA-seq counts as a function of both sample-specific and gene-specific factors. That provides better pooled error estimates of variances to use for differential expression analysis than does the constant variance on a log scale implicit in your approach, potentially improving power to detect true changes. Those packages can help deal with outliers and handle the multiple-comparisons problem. They accept design matrices in ways that should accommodate your experimental design.
Model matrix example
Section 3.5 of the edgeR Users Guide has a design addressing essentially the same question as yours. Each individual
Patient
, having one of 3Disease
types, received both a control and a hormoneTreatment
. The question there is whether theDisease
affects the response toTreatment
, like your interest in whether "treat1" affects the response to "treat2"; it has pairing like yours.To get a corresponding design matrix for your study, replace the User Guide's 9-level
Patient
with your 24-level "family"; its 3-levelDisease
with your 4-level "treat1"; its 2-levelTreatment
with your 2-level "treat2":That accomplishes with pairing the critical part of what your mixed model was doing. That section of the User Guide then shows how to use the resulting model to find genes that respond differentially to combinations of conditions.
The section of the DESeq2 vignette on "Group-specific condition effects, individuals nested within groups" suggests a more efficient model-matrix coding inheriting from that
edgeR
method. As the particular "family" names aren't themselves important and no "family" is involved in more than 1 level of "treat1", you can set up an "ind.n" factor that just annotates the 6 separate families within each level of "treat1." Then your model matrix could be based on the formula~treat1 + treat1:ind.id + treat1:treat2
. The vignette goes on to illustrate how to get comparisons of interest.I haven't carefully thought through the differences between those two suggestions. The point is that these standard packages should be able to answer your fundamental question.
A comment on an earlier version of this answer suggests that you might have additional covariates. If so, the
edgeR
-recommended model matrix only has 28 columns and theDESeq2
recommendation only has 12, allowing you to add columns for some additional covariates (if they aren't linearly dependent on the columns already included).If you do need to use a mixed model, you might need to consider a two-step approach as in Trabzuni et al., Bioinformatics, Volume 30, Issue 11, 1 June 2014, Pages 1555–1561, which combined mixed modeling with subsequent finite mixture modeling to separate differentially expressed from non-differential transcripts.