Experiment Design – Importance of Using Control Variables in Experiments

experiment-designrandom allocationtreatment-effect

Why would one want to control for any number of baseline covariates in a situation where the assignment to treatment group is random?

My understanding is that randomly assigning treatment should make the treatment variable strictly exogenous, creating a control group that can appropriately be considered as a counterfactual. The only exception I can think of is when sample sizes are small, and that random assignment can still produce unbalanced groups.

Any thoughts are much appreciated. Thanks!

Best Answer

From a frequentist perspective, an unadjusted comparison based on the permutation distribution can always be justified following a (properly) randomized study. A similar justification can be made for inference based on common parametric distributions (e.g., the $t$ distribution or $F$ distribution) due to their similarity to the permutation distribution. In fact, adjusting for covariates—when they are selected based on post-hoc analyses—actually risks inflating the Type I error. Note that this justification has nothing to do with the degree of balance in the observed sample, or with the size of the sample (except that for small samples the permutation distribution will be more discrete, and less well approximated by the $t$ or $F$ distributions).

That said, many people are aware that adjusting for covariates can increase precision in the linear model. Specifically, adjusting for covariates increases the precision of the estimated treatment effect when they are predictive of the outcome and not correlated with the treatment variable (as is true in the case of a randomized study). What is less well known, however, is that this does not automatically carry over to non-linear models. For example, Robinson and Jewell [1] show that in the case of logistic regression, controlling for covariates reduces the precision of the estimated treatment effect, even when they are predictive of the outcome. However, because the estimated treatment effect is also larger in the adjusted model, controlling for covariates predictive of the outcome does increase efficiency when testing the null hypothesis of no treatment effect following a randomized study.

[1] L. D. Robinson and N. P. Jewell. Some surprising results about covariate adjustment in logistic regression models. International Statistical Review, 58(2):227–40, 1991.

Related Question