Regression Coefficients – How to Interpret for Covariates After Matching

matchingpropensity-scoresregression coefficients

I am new to the fascinating world of matching and propensity scoring. It is highly likely that I will be using some (or more) matching method(s) for my forthcoming project, probably with the R package MatchIt.

I am particularly interested in finding out whether there is a method that makes it possible to interpret the regression coefficients of covariates following matching. In math terms, say you have an outcome variable Y, a treatment variable A and a couple of covariates X1 and X2. So you first match with a model that looks something like this: A ~ X1 + X2 (I'm using the R language notation here) and then use the matched dataset to run a regression like Y ~ A + X1 + X2. I would like to be able to interpret the coefficients of this model for X1 and X2.

The relevant peer-reviewed work (Ho et al., 2007, 2011) and vignettes (Greifer, 2022) have confused me a bit. It looks like the school of thought represented by Ho et al. is that matching can be used for preprocessing data before applying some model with Y as the outcome and A, X1 and X2 as the independent variables (like the model above). To me, this is like saying that, after matching the data, I can use whatever model, e.g. linear regression, GLM, GAM, Random Forest etc. to interpret the relationship of Y with X1 and X2 through e.g. regression coefficients, partial plots etc. (See also How exactly to evaluate Treatment effect after Matching? and Regression after matching
.)

On the other hand, Greifer (2022) states, in many places, that one should not interpret the covariate coefficients:

It is important not to interpret the coefficients and tests of the other covariates in the outcome model. These are not causal effects and their estimates may be severely confounded. Only the treatment effect estimate can be interpreted as causal assuming the relevant assumptions about unconfoundedness are met. Inappropriately interpreting the coefficients of covariates in the outcome model is known as the Table 2 fallacy […]

So it is not clear to me if I can or cannot interpret the coefficients of the covariates following matching. I am probably missing something, so some clarity would be highly appreciated!

References
Greifer N (2022). Estimating Effects After Matching. https://kosukeimai.github.io/MatchIt/articles/estimating-effects.html#after-pair-matching-without-replacement [accessed 11 Jan 2022]

Ho DE, Imai K, King G & Stuart EA (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis 15:199–236. doi:10.1093/pan/mpl013

Ho DE, Imai K, King G & Stuart EA (2011). MatchIt: Nonparametric Preprocessing for Parametric Causal Inference. Journal of Statistical Software 42(8).

Best Answer

Because matching concerns a focal treatment variable, the nonparametric preprocessing only reduces the model dependence of the effect of that variable on the outcome, but not of other variables on the outcome. The (intended) result of matching is to eliminate the association between the treatment and the covariates, but the relationships among the covariates remain. So, not all predictors are treated equally in the nonparametric preprocessing; the treatment is given a special status that allows its coefficient not to depend on how the outcome model is specified. I discuss this a bit in this answer.

That is separate from the issue of whether you should interpret the coefficients of covariates at all. The answer to this is "no", and has nothing to do with matching. This is covered extensively in Westreich and Greenland (2013) on the "table 2" fallacy. The reason interpreting the non-focal coefficients causal is so problematic is that the relationship between the covariates and the outcome may be totally confounded or suffer from collider bias. We absolutely know that, at best, they can only be interpreted as conditional relationships (i.e., conditional on the treatment, which mediates the relationship between the covariates and the outcome), but their effects may be confounded, blocked, or otherwise biased beyond interpretation.

So, I will reiterate with certainty that you should not interpret the coefficients on covariates in a regression model fit after matching; only the treatment variable is interpretable as a total, covariate-adjusted effect on the outcome. The literature on nonparametric preprocessing is written with this context already in mind (i.e., with the focus only on a single focal causal variable), which is why it may be under-addressed there.