Solved – How to implement covariate balancing propensity score

propensity-scores

I'm using the CBPS package in R to calculate covariate balancing propensity scores following Imai and Ratkovic 2014. However, I'm a bit confused. The scores that I get from the CBPS package — are those both weights and propensity scores or just weights? Do I use those scores to weight my covariates only or do I match on closest scores and then weight my covariates?

Best Answer

The CBPS object that is the output of a call to CBPS() contains both $fitted.values, which are the estimated propensity scores, and $weights, which are the estimated weights. You can match on the estimated propensity scores or perform some kind of weighting (e.g., IPW for the ATE, weighting by the odds for the ATT) using the propensity scores or the weights. In theory, calculating weights from the estimated propensity scores yourself should yield the same weights that are supplied by CBPS() using the corresponding formula, but I've encountered some issues where this is not true, and so I trust the estimated propensity scores more.

Glad to see this package being used! I find it vastly outperforms twang and is much faster. If you want to check balance on your covariates, I recommend the cobalt package, which has methods for CBPS objects.

Related Solutions

Solved – Procedure for testing covariate balance for generalized propensity score estimator

The method you describe would be a coarse way to evaluate balance, but a finer way is the following:

For each covariate, compute the correlation between the covariate and the treatment variable after conditioning. If it is 0, then the variable will no longer confound the estimate of the treatment effect. Calculating standardized mean differences in the context of binary treatments essentially examines the same thing. Fong, Hazlett, and Imai (2015) consider continuous treatments and compute the absolute Pearson correlations between covariates and treatment to establish balance.

It would also be a good idea to evaluate the correlation between treatment and the squared and other polynomial and interaction terms of the covariates. You want all these to be as close to 0 as possible. In general, you want treatment to be independent form the covariates, so you can use whatever methods are appropriate to determine this (e.g., visually examining scatterplots, etc.).

The method you describe from Guo & Fraser is effective in theory, and of course would be approximately equivalent in large samples with many subclasses. It would actually be superior, because you aren't limited to polynomial correlations (for the same reason subclassification on the PS is superior to covariate adjustment with the PS: you don't have to assume the functional form of the relationship). The problem is that you are coarsening your treatment into 5 categories, which it is not: it's a continuous variable, so independence should be met over the whole distribution, not just within subclasses.

Also, although they recommend it, avoid using hypothesis tests of any kind for balance assessment. Balance can become conflated with power when using them.

If you're using R for propensity score analysis, consider the cobalt package for assessing balance. In the next release, balance assessment for continuous treatment will be implemented. [Edit: it can now assess balance for continuous treatments.]

Solved – Difference between using a propensity score for matching vs. regression analysis

There are four common ways of using propensity scores (PS) to reduce confounding and arrive at an unbiased estimate of a causal effect. These are PS matching, PS weighting, PS subclassification, and regression on the PS. There have been systematic studies on the relative performance of these methods, but new variations of them always come out and it's not always immediately clear whether the new methods will adhere to the hierarchy of methods commonly assumed. Each method has its strengths and weaknesses. They vary mostly in their empirically untestable assumptions and on their empirical performance.

I'll only talk about PS matching and regression on the PS. One important concept to know is the relationship between the PS and the outcome. We have to assume there is some functional relationship, so that in theory we could model the outcome "correctly" as a function of treatment and the propensity score.

PS Matching

PS matching involves estimating a PS for each unit (usually using logistic regression), then, for each treated unit, finding one or more control units with a similar PS, and discarding the unmatched control units. Variations include how many control units to match, whether they should be matched with or without replacement, whether a caliper should be used, etc. For now we'll assume 1:1 matching without replacement and without a caliper. PS matching can really only be used to estimate the average treatment effect on the treated (ATT). If the effect of treatment is the same for everyone, then this quantity is equal to the average treatment effect in the population (ATE).

With a correctly modeled propensity score and exact matching on the propensity score, the treatment effect estimate for 1:1 PS matching will be unbiased. What's nice about this is that it doesn't matter what the relationship between the outcome and the PS is; if exact matching on the PS is performed, the effect estimate will be unbiased, as guaranteed by Rosenbaum & Rubin (1983). In practice, though, the PS is not modeled correctly, and it's impossible to do exact matching on the PS. So the properties only hold approximately. To assess the plausibility of unbiased estimation in your sample, you can check balance on the covariates after matching. If they are balanced and if the outcome is a linear combination of the covariates that are balanced, then the effect estimate will be unbiased even if the PS model is incorrect or the matching is inexact. This is the "propensity score tautology", which says that if the propensity score yields balance, then it is a valid propensity score, because the purpose of the propensity score is to create balance. Definitely read Ho et al. (2007), who describe the tautology and other fine points; it contains many subtle points that may broaden your understanding.

Regression on the PS

Regression on the PS involves estimating a PS, and then regressing the outcome on the PS and the treatment. Regressing the outcome on the PS is one way of "conditioning" on the PS, and thanks to Rosenbaum & Rubin (1983), we know that the treatment effect estimate is unbiased conditional on the PS. Variations include regression on the linear PS and other transformations, flexible modeling of the PS and/or outcome with splines, and using Bayesian additive regression trees (BART) with a PS to generate estimated potential outcomes. For now, we'll assume linear regression on the PS.

With a correctly modeled PS and a correctly modeled functional relationship between the outcome and the PS, it is possible to estimate an average treatment effect. If the treatment effect is the same for all units, the coefficient on the treatment is the ATE. Otherwise, you can estimate a treatment effect for each possible PS value (see this post). The average treatment effect for everyone, estimated at their respective PS, is the ATE. In practice, though, the PS is not modeled correctly, and the functional relationship between the outcome and the treatment and PS is not modeled correctly. A common error is to not include an interaction between the PS and the treatment in the outcome model. In these cases, there is no reason the treatment effect estimate will be even moderately unbiased. For this reason, many methodologists recommend against using regression on the PS unless using a flexible model like BART. You cannot assess the degree to which regression on the PS will remove bias (i.e., the concept of balance doesn't exactly apply, so you can't check balance), so regression on the PS requires very strong assumptions which are almost guaranteed to be false in real data.

Overall, matching is superior to regression from a theoretical point of view and empirically, but matching itself has many weaknesses as well. You might want to look into methods that don't rely strictly on the propensity score, such as BART and targeted minimum loss-based estimation (TMLE), which require few assumptions and tend to yield unbiased estimates.