Solved – Matching by or adjusting for confounders

adjustmentcausalityconfoundingmatchingregression

When using regression models with a binary exposure, how do you choose whether to adjust for a confounders as covariates or to match the two exposure groups according to the confounders and then performing univariate regression?

I appreciate there are issues with practicality, such as how many covariates there are and how easy it is to match them (eg categorising a bunch of continuous covariates is not ideal) etc.

For simplicity, let's say it's a prospective cohort study, the outcome is binary, exposure is binary, and there is just one confounder, age. You could use logistic regression adjusting for age as covariate, or match by age and then compare proportions of the outcome according to exposure. Practicality aside, what's the reason to chose one or the other?

I have a feeling the answer is to do with how much the distribution of the confounder, in this case age, overlap between exposure groups, but do not have a formal understanding of why one would chose one or the other method.

This article, for example, performs the analysis using both methods, except the covariates were, confusingly, different for each method.

Best Answer

This matter has been discussed in the literature a fair bit. See Ho, Imai, King, & Stuart (2007) and Kang & Schafer (2007) for some good intuitions on why you might prefer matching over regression.

One important benefit of matching, especially pair matching, is that one does not need to make a functional form assumptions about the relationship between the confounder and the outcome. For example, if that relationship (conditional on treatment) was not well approximated by a simple regression model, bias would remain in the effect estimate using regression but not using matching. Of course, this is a bit of a straw man because it's possible to skillfully estimate a flexible regression model that accounts for nonlinearities. It's also possible to perform a sophisticated match that protects against unmeasured confounding. Matching may also be preferred in the case of many confounders because it's harder to model the relationships of all of them with the outcome, but it may be possible to match on them.

Matching can protect against extrapolation from a region of covariate overlap between your two groups. This is discussed by Ho et al. (2007). Matching can protect you from capitalizing on chance due to readjusting a regression model after examining the effect estimates. Matching provides access to estimands not available with regression (e.g., the average treatment effect on the treated). Matching estimates marginal effects, which cannot be so easily estimated with logistic regression.

On the other hand, regression is optimally efficient, doesn't require you to throw away units, and allows you to estimate conditional effects and interactions.

Related Question