Multiple Regression – Effect of Including a Confounder or Mediator in Regression Models

confoundingmediationmultiple regressionregression coefficients

I have two questions regarding the change of coefficient when including a confunder or a mediator as a independent variable in a multiple regression.

When I include a mediator in a multiple regression, the significance of the exposure's coefficient may decrease. But will the coefficient (the value of the coefficient, rather than the SE of the coefficient) decrease as well?

Also, according to definition, a confounder (if left untreated) may increase or decrease the coorrelation between the exposure and the outcome. So, when I include a confounder in a multiple regression, what will happen to the coefficient of the exposure? (As far as I know, the coefficient of the exposure tells us the effect of the exposure on the outcome when other variables are hold constant. It does not reflect degree of association between the exposure and the outcome) Thank you!

Best Answer

Basically, when you add variables to a regression, no matter what those variables are, the coefficient on exposure can increase or decrease. You generally cannot learn anything about the causal structure of your system by observing coefficient change when adding or removing covariates to a model.

Often, adding a mediator will decrease the coefficient on the exposure because part of the effect of the exposure on the outcome is blocked, but in the presence of suppression, adding a mediator can increase the coefficient on exposure. Suppression occurs when there are two compensatory pathways from the exposure to the outcome, and holding one of them constant magnifies the other. For example, the effect of doing cardio exercise on weight. Consider the mediator "number of calories consumed". Doing cardio increases the number of calories consumed, which increases your weight, but it also increases the number of calories burned, which decreases your weight. If these effects were equal, there would be no total effect of cardio on weight. However, holding constant one of the factors, it is clear the size of the effect would increase. For example, holding constant number of calories consumed, the effect of cardio on weight may be quite large (in the negative direction); indeed, this is the primary motivation behind controlling your diet when exercising to lose weight. Suppression is described in detail by Kim (2019).

The coefficient on exposure represents the partial correlation between the exposure and outcome given the covariates in the model. So if including a confounder decreases the correlation between the exposure and the outcome, the coefficient on the exposure will also decrease. As far as I know, the coefficient of the exposure tells us the effect of the exposure on the outcome when other variables are held constant. It does not reflect degree of association between the exposure and the outcome This is incorrect. The coefficient on the exposure does represent the association between the exposure and the outcome; that's all it represents without strong assumptions about the nature of confounding. It may be a partial association (i.e., after adjusting for other covariates in the model).

For an excellent introduction to the interpretation of linear models in the context of causal inference (i.e., with respect to mediation and confounding), I recommend Pearl (2013).