R Programming – Use of Confounders in Generating Propensity Scores for Matching

causalitymatchingpropensity-scoresr

If the purpose of a propensity score is to generate a "propensity" of a unit receiving treatment, then is it correct to say that the predictors of the treatment need not be confounders (necessarily). For example, Z1 may be theoretically related to both Y and T. Z2 may only be related to T and not Y. However, if the goal is to predict treatment, then can any preditor of T (regardless of correlation with Y).

For causal inference purposes, I completley understand including all avaliable confounders in the generation of propensity scores. However, it seems that certain variables associted with Y but not with T would be valuable to create a more accurate propensity score. Is this line of thinking correct? Or should propensity scores be generated exclusivley by confounders?

Best Answer

The balancing property of propensity scores has nothing to do with whether the predictors are confounders or not. It is a purely statistical property that has nothing to do with causal inference or matching, etc. Pearl explains this extremely clearly in his book Causality (see section 11.3.5 here in particular).

Propensity scores are used to estimate the non-causal "adjustment" estimand $E[E[Y|X, T = 1]] - E[E[Y|X, T = 0]]$ without specifying models for $E[Y|X, T = 1]$ and $E[Y|X, T = 0]$. What the adjustment estimand means to you is a different story. If you seek to estimate the total effect of $T$ on $Y$, i.e., $E[Y^1] - E[Y^0]$, then the adjustment estimand is equal to the total effect when the assumptions required for causal inference are met, which include that $X$ contains a sufficient set of variables to eliminate confounding by closing all backdoor paths and not opening new backdoor paths. This set of variables is called a "sufficient adjustment set". If the adjustment estimand means something different to you (i.e., it refers to a non-causal quantity or a causal quantity other than the total effect), then you will need different rules for choosing $X$.

The definition of a confounder is precise and described in my answer here. A confounder is a member of a minimal sufficient adjustment set, a set for which no proper subset of variables is also a sufficient adjustment set. However, there may be variables part of your adjustment set that are not part of a minimal sufficient adjustment set, in which case they are not confounders. However, we know that including such variables can affect the precision of the resulting effect estimate in finite samples. In particular, including instruments (causes of treatment, not the outcome) reduces the precision, whereas including prognostic variables (causes of the outcome that do not cause and are not caused by the treatment) increases precision.

This is true whether you are using propensity scores to estimate the adjustment estimand or any other method that relies on covariate adjustment, like matching methods that don't use the propensity score or regression adjustment. There is nothing special about propensity scores in terms of variable selection.

The goal is not to predict treatment well. The goal is to estimate a propensity score that balances the sufficient adjustment set and does so while maintaining precision. Including the confounders and prognostic variables accomplishes this, and including instruments can make things worse.

Related Question