Propensity Scores – How to Compute Standardized Mean Differences After Propensity Score Adjustment

propensity-scoresstandardized-mean-difference

Standardized mean differences (SMD) are a key balance diagnostic after propensity score matching (eg Zhang et al).

Their computation is indeed straightforward after matching. However, I am not plannig to conduct propensity score matching, but instead propensity score adjustment, ie by using propensity scores as a covariate, either within a linear regression model, or within a logistic regression model (see for instance Bokma et al as a suitable example). However, I am not aware of any specific approach to compute SMD in such scenarios.

Thus, my question is:

Can SMD be computed also when performing propensity score adjusted analysis?

Best Answer

I'm going to give you three answers to this question, even though one is enough. In summary, don't use propensity score adjustment. It consistently performs worse than other propensity score methods and adds few, if any, benefits over traditional regression.

The first answer is that you can't. Matching is a "design-based" method, meaning the sample is adjusted without reference to the outcome, similar to the design of a randomized trial. Here, you can assess balance in the sample in a straightforward way by comparing the distributions of covariates between the groups in the matched sample just as you could in the unmatched sample. In contrast, propensity score adjustment is an "analysis-based" method, just like regression adjustment; the sample itself is left intact, and the adjustment occurs through the model. In the same way you can't* assess how well regression adjustment is doing at removing bias due to imbalance, you can't* assess how well propensity score adjustment is doing at removing bias due to imbalance, because as soon as you've fit the model, a treatment effect is estimated and yet the sample is unchanged. Indeed, this is an epistemic weakness of these methods; you can't assess the degree to which confounding due to the measured covariates has been reduced when using regression. Therefore, matching in combination with rigorous balance assessment should be used if your goal is to convince readers that you have truly eliminated substantial bias in the estimate.


The second answer is that Austin (2008) developed a method for assessing balance on covariates when conditioning on the propensity score. The method is as follows:

  1. Fit a regression model of the covariate on the treatment, the propensity score, and their interaction
  2. Generate predicted values under treatment and under control for each unit from this model
  3. Subtract the means of these values
  4. Divide by the estimated residual standard deviation (if the outcome is continuous) or a standard deviation computed from the predicted probabilities (if the outcome is binary)

This is equivalent to performing g-computation to estimate the effect of the treatment on the covariate adjusting only for the propensity score. If, conditional on the propensity score, there is no association between the treatment and the covariate, then the covariate would no longer induce confounding bias in the propensity score-adjusted outcome model. Of course, this method only tests for mean differences in the covariate, but using other transformations of the covariate in the models can paint a broader picture of balance more holistically for the covariate. Though this methodology is intuitive, there is no empirical evidence for its use, and there will always be scenarios where this method will fail to capture relevant imbalance on the covariates. It also requires a specific correspondence between the outcome model and the models for the covariates, but those models might not be expected to be similar at all (e.g., if they involve different model forms or different assumptions about effect heterogeneity).


The third answer relies on a recent discovery, which is of the "implied" weights of linear regression for estimating the effect of a binary treatment as described by Chattopadhyay and Zubizarreta (2021). Basically, a regression of the outcome on the treatment and covariates is equivalent to the weighted mean difference between the outcome of the treated and the outcome of the control, where the weights take on a specific form based on the form of the regression model. These weights often include negative values, which makes them different from traditional propensity score weights but are conceptually similar otherwise. In theory, you could use these weights to compute weighted balance statistics like you would if you were using propensity score weights. Your outcome model would, of course, be the regression of the outcome on the treatment and propensity score. From that model, you could compute the weights and then compute standardized mean differences and other balance measures. All of this assumes that you are fitting a linear regression model for the outcome. As this is a recently developed methodology, its properties and effectiveness have not been empirically examined, but it has a stronger theoretical basis than Austin's method and allows for a more flexible balance assessment.


What should you do? Don't use propensity score adjustment except as part of a more sophisticated doubly-robust method. If you want to prove to readers that you have eliminated the association between the treatment and covariates in your sample, then use matching or weighting. If you want to rely on the theoretical properties of the propensity score in a robust outcome model, then use a flexible and doubly-robust method like g-computation with the propensity score as one of many covariates or targeted maximum likelihood estimation (TMLE).