Propensity Scores – Should Variables Used in Exact Matching Also Be Used in the Model?

matchingobservational-studypropensity-scores

In propensity score matching, we can match on variables exactly. For example, we can match males with other males only. Additionally, the variable can be specified in the model. Here's some SAS code showing an example with 2:1 matching (control:treatment) using the logit of the propensity score:

proc psmatch data = data_to_match;
  class gender;
  model treated = gender IQ;
  match method = greedy (k = 2) exact = (gender) stat = lps;
  output out (obs = match) = matched_data matchid = match_id;
run;

Note how gender is used in the EXACT= option and in the MODEL statement. I assume R and other statistical packages offer the same types of options.

Is it necessary to use gender in both places?

I could see it both ways:

  1. Yes, because you get a more accurate propensity score.
  2. No, because you did an exact match, which should no longer impact the outcome and therefore should not impact the propensity score.

The examples on the SAS support site include gender in both positions, leading me to think that is the correct specification.

Best Answer

Yes, we can/is recommended to use a variable $x$ that we used for matching in our final model. The matching itself can also have different steps as here, both exact and then PSM. Using multiple procedures in our analysis does not necessitate using a variable $x$ only in one of the steps.

Using certain covariates with matching procedures as well as other steps of the analysis falls broadly within the context of doubly robust estimators; Stuart (2011) Matching methods for causal inference: A review and a look forward and Kang & Schafer (2007) Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data are good places to look at this in more detail. As you correctly recognise, using $x$ again will potentially assists in terms of model efficiency (e.g. standard errors will be smaller). This is true even for exact matching followed by PS calculations as ultimately we will get the output of a logistic model. As the matching procedure is not guaranteed to be perfect, using a variable $x$ both for matching as well as our final model is almost certainly more helpful (e.g. guards against misspecification of the PSM model and reduces error the standard of the final estimates) at the expense of having slightly fewer degrees of freedom in our model.

As always, doubly/triply/quadruply/quintuply robust methods, or any other matching methods (e.g. entropy balancing) cannot guard us again unmeasured confounding variables.

Related Question