R – Difference Between Normalized Difference and Standardized Mean Difference in Cobalt

causalitycovariate-balancematchingpropensity-scoresr

In Imbens & Wooldridge (2009, p. 19), they define the normalized difference as:

Normalized difference

whereas the cobalt's package standardized mean difference uses by default (for the ATE) "the 'pooled' standard deviation (i.e., the square root of the mean of the group variances) in calculating standardized mean differences" (https://ngreifer.github.io/cobalt/reference/bal.tab.html#with-binary-point-treatments).

What is the difference between using the sum of the sample variances versus using the mean of the sample variances? Which is preferred?

Best Answer

The difference is that the "normalized difference" in the cited article is an error and the standardized mean difference as described in all other articles on assessing balance and the cobalt documentation is correct. The sum of two variances is not a valid way to represent the variability of the distribution of the covariates. This is likely a typo or a misunderstanding by the authors.

The authors say (p. 24)

Imbens and Rubin (forthcoming) suggest as a rule of thumb that with a normalized difference exceeding one quarter, linear regression methods tend to be sensitive to the specification.

Take a look at the normalized difference in Imbens and Rubin (2015, Ch 15.2, p. 339):

enter image description here

This definition of the normalized difference is correct and consistent with that used in cobalt.