Solved – Difference between using a propensity score for matching vs. regression analysis

analysismultivariate analysispropensity-scoresregression

So I am confused on what the difference is if I match patients based on propensity scores vs. using the propensity score and then applying that into a multivariate regression analysis? Is there a difference? One you match 1:1 and disgard the groups that dont match, and the other you stratify patients based on weights? confused….need hlep

Best Answer

There are four common ways of using propensity scores (PS) to reduce confounding and arrive at an unbiased estimate of a causal effect. These are PS matching, PS weighting, PS subclassification, and regression on the PS. There have been systematic studies on the relative performance of these methods, but new variations of them always come out and it's not always immediately clear whether the new methods will adhere to the hierarchy of methods commonly assumed. Each method has its strengths and weaknesses. They vary mostly in their empirically untestable assumptions and on their empirical performance.

I'll only talk about PS matching and regression on the PS. One important concept to know is the relationship between the PS and the outcome. We have to assume there is some functional relationship, so that in theory we could model the outcome "correctly" as a function of treatment and the propensity score.

PS Matching

PS matching involves estimating a PS for each unit (usually using logistic regression), then, for each treated unit, finding one or more control units with a similar PS, and discarding the unmatched control units. Variations include how many control units to match, whether they should be matched with or without replacement, whether a caliper should be used, etc. For now we'll assume 1:1 matching without replacement and without a caliper. PS matching can really only be used to estimate the average treatment effect on the treated (ATT). If the effect of treatment is the same for everyone, then this quantity is equal to the average treatment effect in the population (ATE).

With a correctly modeled propensity score and exact matching on the propensity score, the treatment effect estimate for 1:1 PS matching will be unbiased. What's nice about this is that it doesn't matter what the relationship between the outcome and the PS is; if exact matching on the PS is performed, the effect estimate will be unbiased, as guaranteed by Rosenbaum & Rubin (1983). In practice, though, the PS is not modeled correctly, and it's impossible to do exact matching on the PS. So the properties only hold approximately. To assess the plausibility of unbiased estimation in your sample, you can check balance on the covariates after matching. If they are balanced and if the outcome is a linear combination of the covariates that are balanced, then the effect estimate will be unbiased even if the PS model is incorrect or the matching is inexact. This is the "propensity score tautology", which says that if the propensity score yields balance, then it is a valid propensity score, because the purpose of the propensity score is to create balance. Definitely read Ho et al. (2007), who describe the tautology and other fine points; it contains many subtle points that may broaden your understanding.

Regression on the PS

Regression on the PS involves estimating a PS, and then regressing the outcome on the PS and the treatment. Regressing the outcome on the PS is one way of "conditioning" on the PS, and thanks to Rosenbaum & Rubin (1983), we know that the treatment effect estimate is unbiased conditional on the PS. Variations include regression on the linear PS and other transformations, flexible modeling of the PS and/or outcome with splines, and using Bayesian additive regression trees (BART) with a PS to generate estimated potential outcomes. For now, we'll assume linear regression on the PS.

With a correctly modeled PS and a correctly modeled functional relationship between the outcome and the PS, it is possible to estimate an average treatment effect. If the treatment effect is the same for all units, the coefficient on the treatment is the ATE. Otherwise, you can estimate a treatment effect for each possible PS value (see this post). The average treatment effect for everyone, estimated at their respective PS, is the ATE. In practice, though, the PS is not modeled correctly, and the functional relationship between the outcome and the treatment and PS is not modeled correctly. A common error is to not include an interaction between the PS and the treatment in the outcome model. In these cases, there is no reason the treatment effect estimate will be even moderately unbiased. For this reason, many methodologists recommend against using regression on the PS unless using a flexible model like BART. You cannot assess the degree to which regression on the PS will remove bias (i.e., the concept of balance doesn't exactly apply, so you can't check balance), so regression on the PS requires very strong assumptions which are almost guaranteed to be false in real data.

Overall, matching is superior to regression from a theoretical point of view and empirically, but matching itself has many weaknesses as well. You might want to look into methods that don't rely strictly on the propensity score, such as BART and targeted minimum loss-based estimation (TMLE), which require few assumptions and tend to yield unbiased estimates.

Related Question