Solved – Controlling for confounding variables in linear mixed effects models (lmer)

causalitylme4-nlmemixed modelrrandom-effects-model

I'm using lmer to test how multiple variables (in this case, treatment, species, and sex) influence avian behaviour.

library(lme4)
M1 <- lme4(Behaviour ~ Treatment+Subspecies+Sex + (1|Individual)+(1|Stimulus-ID), data=data)

where behaviour is continuous and Treatment,Subspecies, and sex are all categorical. Individual and Stimulus Ids are set to be as random variables as this was a repeated design (for individual) and I want to reduce pseudoreplication by controlling for my stimuli (e.g., bird song playback) as is often done in behavioural research.

In early efforts, I've found that Treatment and Sex are important in some behavioural context and Subspecies in others (but interactions between these fixed factors are non-significant). However, while in the field, I noted other covariates that appear significant when I run them in the full model. For example, Time of day a behaviour recorded was noted is an important predictor of the overall behaviour.

However, I'm mostly interested in the effect of Treatment, Subspecies, and Sex.
I would like to control for this confounding variable (among others), but I'm pretty stumped on the proper way to code for this and I would appreciate any insight one may have. That is to say, I know time of day is important in predicting behaviour, so I want to account for this so that I can fully appreciate the effect Treatment/subspecies/sex has on individual behaviour.

If this is poorly written or needs further clarification, I'm happy to provide any more insight. Thanks in advance for your help and any suggestions you have!

Best Answer

You have stated that you believe Time is a confounding variable in this analysis. If so then you should include Time as a covariate in the analysis.

However, before doing so, it is important to ensure that the variable is indeed a (potential) confounder, or a competing exposure.

To be a confounder, it must be a cause, or a proxy of a cause, of the outcome, AND a cause, or a proxy of a cause, of the exposure(s). So, in this case, if Time causes Behaviour AND also causes any of the other exposures, then it is indeed a confounder. It seems unlikely that it can be a cause of Sex or Subspecies, but if it determines the Treatment given, then it is a confounder, and should be included as a covariate in order to obtain unbiased estimates of the other fixed effects. The estimate for Time (and it's statistical significance) is irrelevant (and should not be interpreted if it is a confounder).

On the other hand, if Time is on the causal pathway from the exposure(s) to the outcome, for example, if the Treatment given depends on the time of day, then it is a mediator and should not be included as a covariate - including a mediator in a regression can invoke a reversal paradox (for example Simpson's Paradox) - see Tu et al (2008)

Lastly, if Time is not a cause of the exposure(s) (but is a cause of the outcome), then it should be treated as a competing exposure, and included in the model as a covariate; this will improve the accuracy of the other fixed effects estimates that you are interested in.

References:
Tu, Y.K., Gunnell, D. and Gilthorpe, M.S., 2008. Simpson's Paradox, Lord's Paradox, and Suppression Effects are the same phenomenon–the reversal paradox. Emerging themes in epidemiology, 5(1), p.2.