Solved – Random Effects Design Matrix

fixed-effects-modelgeneralized linear modelrrandom-effects-modelsplit-plot

I'm trying to move away from ANOVA/t-tests and get a better understanding of GLMs. I am doing my statistical analysis in R using function lm (only fixed effects) and lmer (+ random effects). I also have access to Matlab.

1) When running a GLM analysis in R/Matlab, is there a way to see the design matrix used? I think this would be helpful in building my intuition of what is going on.

Let's say I have a study with a 2-level factor (condition) that is repeated within subject.

In R, if I don't include the random effect of subject,
I write this as "DV ~ condition".
Assuming a dummy coding of the factor condition, this would look like:

condA, sub1 = B_0

condB, sub1 = B_0 + B_1

condA, sub2 = B_0

condB, sub2 = B_0 + B_1
…..

Now I include the random effect of subject by adding a by-subject intercept
"DV ~ condition + (1 | sub)". My guess is that the model then looks something like this:

condA, sub1 = S_1

condB, sub1 = S_1 + B_1

condA, sub2 = S_2

condB, sub2 = S_2 + B_1

…..

The design matrix would look like:

[ 0 1 0 0 0 …;

1 1 0 0 0 …;

0 0 1 0 0 …;

1 0 1 0 0 …;

……………….]

The first column codes for the presence or absence of condition B and each subsequent column codes for each individual subject.

Two question following from this:

2) If this is how the model's regressors are solved, what is the difference between the fixed factor (conditions) and the random factor (subject). The Beta weights of the regressors seem to be solved equivalently. Is the standard error assessed differently?

3) What does the model look like if I am dealing with a split-plot design in which condition is repeated within subject, but subject is nested within group(a 2 level factor)?

If I take the simple case in which I assume no interaction between condition and group, I would write this as "DV ~ condition + group + (1 | sub)".

However, I'm not sure what the model would look like.

My guess is something like:

condA, group1, sub1 = S_1

condB, group 1, sub1 = S_1 + B_1

condA, group 1, sub2 = S_2

condB, group 1, sub2 = S_2 + B_1

condA, group 2, sub3 = G_1 + S_3

condB, group 2, sub3 = G_1 + S_3 + B_1

condA, group 2, sub4 = G_1 + S_4

condB, group 2, sub4 = G_1 + S_4 + B_1

But then it seems like I also need to add the following eqtn to ensure proper calculation of G_1:

Average(condB, group 2, sub3; condB, group 2, sub4) = G_1

Best Answer

(1) If fm1 is an object produced by lm() then model.matrix(fm1) is the $X$ matrix also known as the model matrix.

If fm2 is an object produced by lmer() then getME(fm2, "X") and getME(fm2, "Z") are the $X$ and $Z$ matrices. See https://en.wikipedia.org/wiki/Mixed_model for full definitions.

(2) if one thinks of a mixed model as a least squares problem then the difference between the fixed and random effects is that a penalty is added to the least squares objective function which depends only on the random effects coefficients. Further details can be found, for example, in https://projecteuclid.org/euclid.lnms/1249305331

(3) if this question is what the $X$ and $Z$ matrices look like for the indicated formula then run (1) above in R for your model with your data. If the question is how to formulate such models in R then see http://www.maths.bath.ac.uk/~jjf23/mixchange/split.html

Related Solutions

Solved – Compare regression slopes of repeated measures linear regression

You should probably use a linear mixed model to fit a random-slopes model:

library(nlme)
m1 <- lme(RT ~ span_num*group,
          random = ~span_num|subject, data=data)
summary(m1)

(You need to be careful here as I believe summary.lme gets the denominator degrees of freedom wrong for random-slopes models. If you have a large (>40) total number of subjects it won't matter much.)

library(lme4)
m2 <- lmer(RT ~ span_num*group+(span_num|subject),
           data=data, method="ML")
drop1(m2,test="Chisq")

The latter case ignores the denominator DF issue entirely. pbkrtest::KRmodcomp gets it right.

Solved – Glmer random effects model vs. dumthe-coded fixed effects

I am thinking about the same problem and found your question here. I'll tell you what I know so far. Maybe we can reach a satisfying conclusion.

The random effect models treat between subject (in your case, different Gamble.Nums) variation as normally distributed. But it may not be normal, and then if you try to fit a random effect model, the result will be biased. To fix this problem, people use dummy variables to code those subjects, and estimate their effect separately, instead of assuming a distribution. This approach looks good, but you have a bunch more parameters to estimate, and you don't have a statistical distribution of the whole population. For the random effect model, those subjects (Gamble.Nums) are assumed to be randomly sampled from a population. From the random model fit, you get estimates of the parameters of the distribution of that population.

Or, you can adopt a model selection point of view, and use anova(glmer, glm) to see which one is better.

From my own experience, the estimates from the glm using dummies and those from ranef(glmer) are strongly correlated. The real difference is not that large at all.

Best Answer

Related Solutions

Solved – Compare regression slopes of repeated measures linear regression

Solved – Glmer random effects model vs. dumthe-coded fixed effects

Related Question