I've used linear mixed models to test if factors genotype and sex influence colon length, while including batch as a random effect. I first ran the testvalue ~ genotype + SEX + (1 | BOX)
and got the following results, with sex being significant:
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: value ~ genotype + SEX + (1 | BOX)
Data: ColonLength.new
REML criterion at convergence: 94.1
Scaled residuals:
Min 1Q Median 3Q Max
-1.69433 -0.55283 0.00537 0.61300 2.01439
Random effects:
Groups Name Variance Std.Dev.
BOX (Intercept) 0.1346 0.3669
Residual 0.2819 0.5309
Number of obs: 49, groups: BOX, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 8.0512 0.1832 16.1116 43.943 < 2e-16 ***
genotypemGlu5 0.0914 0.2281 16.3917 0.401 0.69379
SEXF -0.7549 0.2280 16.5801 -3.310 0.00425 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) gntyG5
genotypmGl5 -0.515
SEXF -0.518 -0.142
After including a sex-genotype interaction with the formula value ~ genotype*SEX + (1 | BOX)
sex is no longer significant
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: value ~ genotype * SEX + (1 | BOX)
Data: ColonLength.new
REML criterion at convergence: 92.7
Scaled residuals:
Min 1Q Median 3Q Max
-1.62511 -0.66926 0.05283 0.58236 2.10327
Random effects:
Groups Name Variance Std.Dev.
BOX (Intercept) 0.1362 0.3691
Residual 0.2800 0.5291
Number of obs: 49, groups: BOX, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 7.9527 0.2055 15.2546 38.707 <2e-16 ***
genotypemGlu5 0.3299 0.3196 14.4285 1.032 0.319
SEXF -0.5174 0.3185 18.0472 -1.625 0.122
genotypemGlu5:SEXF -0.4888 0.4570 15.6767 -1.070 0.301
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) gntyG5 SEXF
genotypmGl5 -0.643
SEXF -0.645 0.415
gntyG5:SEXF 0.450 -0.699 -0.697
Should I report both? I.e., something like "the main effect of sex was significant (β = -.75, SE= .23,p= .005), but was no longer significant when the interaction between sex and genotype was included (β = .52, SE= .32, p = .12)"? Is this an appropriate way to report the results? (I know there are also people who recommend reporting the fixed effect estimates, the confidence interval, and the strength of the effect, and still others who somehow report LMM like "F(df,dferror) = F-value, p = p-value". Which is preferable?)
If there is a main effect, is it appropriate to then go and look at the interaction between terms? Or is that something I should primarily be doing if there's not a main effect observed?
Apologies if these questions are inane – I don't have much experience with statistics and have been kind of thrown in the deep end. I'd really appreciate any help.
Best Answer
In your second model, the effect of sex shown (-0.5174) is the estimate of the effect of sex at the reference level of genotype. The estimate for the sex effect at the mGlu5level is -0.5174-0.448= -0.965. So when the interaction is in the model there is no longer a main effect of sex reported, just different estimates for different genotypes, suggesting a greater effect of sex for the mGlu5 genotype. Yet the p-value for the interaction is 0.301, so there isn't much evidence from the data that those effects are genuinely different in the population.
Now, it probably makes more sense to think about the effect of genotype varying by sex than the effect of sex varying by genotype, (although mathematically they are the same thing). Still, there is little evidence from your data that there is an interaction effect present, so I would probably report the main effects (first model) as your best estimates of the effects of sex and genotype while mentioning that the second model suggests little evidence for an interaction (although it doesn't rule it out, interactions are difficult to detect).