Solved – Random Effects Design Matrix

fixed-effects-modelgeneralized linear modelrrandom-effects-modelsplit-plot

I'm trying to move away from ANOVA/t-tests and get a better understanding of GLMs. I am doing my statistical analysis in R using function lm (only fixed effects) and lmer (+ random effects). I also have access to Matlab.

1) When running a GLM analysis in R/Matlab, is there a way to see the design matrix used? I think this would be helpful in building my intuition of what is going on.

Let's say I have a study with a 2-level factor (condition) that is repeated within subject.

In R, if I don't include the random effect of subject,
I write this as "DV ~ condition".
Assuming a dummy coding of the factor condition, this would look like:

condA, sub1 = B_0

condB, sub1 = B_0 + B_1

condA, sub2 = B_0

condB, sub2 = B_0 + B_1
…..

Now I include the random effect of subject by adding a by-subject intercept
"DV ~ condition + (1 | sub)". My guess is that the model then looks something like this:

condA, sub1 = S_1

condB, sub1 = S_1 + B_1

condA, sub2 = S_2

condB, sub2 = S_2 + B_1

…..

The design matrix would look like:

[ 0 1 0 0 0 …;

1 1 0 0 0 …;

0 0 1 0 0 …;

1 0 1 0 0 …;

……………….]

The first column codes for the presence or absence of condition B and each subsequent column codes for each individual subject.

Two question following from this:

2) If this is how the model's regressors are solved, what is the difference between the fixed factor (conditions) and the random factor (subject). The Beta weights of the regressors seem to be solved equivalently. Is the standard error assessed differently?

3) What does the model look like if I am dealing with a split-plot design in which condition is repeated within subject, but subject is nested within group(a 2 level factor)?

If I take the simple case in which I assume no interaction between condition and group, I would write this as "DV ~ condition + group + (1 | sub)".

However, I'm not sure what the model would look like.

My guess is something like:

condA, group1, sub1 = S_1

condB, group 1, sub1 = S_1 + B_1

condA, group 1, sub2 = S_2

condB, group 1, sub2 = S_2 + B_1

condA, group 2, sub3 = G_1 + S_3

condB, group 2, sub3 = G_1 + S_3 + B_1

condA, group 2, sub4 = G_1 + S_4

condB, group 2, sub4 = G_1 + S_4 + B_1

But then it seems like I also need to add the following eqtn to ensure proper calculation of G_1:

Average(condB, group 2, sub3; condB, group 2, sub4) = G_1

Best Answer

(1) If fm1 is an object produced by lm() then model.matrix(fm1) is the $X$ matrix also known as the model matrix.

If fm2 is an object produced by lmer() then getME(fm2, "X") and getME(fm2, "Z") are the $X$ and $Z$ matrices. See https://en.wikipedia.org/wiki/Mixed_model for full definitions.

(2) if one thinks of a mixed model as a least squares problem then the difference between the fixed and random effects is that a penalty is added to the least squares objective function which depends only on the random effects coefficients. Further details can be found, for example, in https://projecteuclid.org/euclid.lnms/1249305331

(3) if this question is what the $X$ and $Z$ matrices look like for the indicated formula then run (1) above in R for your model with your data. If the question is how to formulate such models in R then see http://www.maths.bath.ac.uk/~jjf23/mixchange/split.html