After reading the section on variable selection in OHDSI for population-level estimation effects, I set out to add additional covariates to my process. As suggested, I began looking at implementing regularized regression (elastic net) for what I eventually realized was basic feature selection.
To clarify, the goal of my PSM implementation is to control for confounding effects for the calculation of ATT and eventually ITE.
When I've approached this topic before, it seemed unclear (to provide a single example) and ultimately descends into a larger debate about using preprocessing and model evaluation techniques commonly found in prediction tasks verses what I'm trying to do: control for confounding for causal inference. Even Stack suggested I start there
But my question is, what strikes the best balance? Does increased PS "accuracy" better control for confounding? More specifically:
- Should highly correlated features be removed prior to generating the propensity score? Afterall, PSM at its core is a logistic regression. Yet, that step does not appear common
- If we are or should be performing variable selection, are all methodologies on the table? It would seem that we'd want to use the highest performing method, be it RFE using decision trees or regularized regression.
- Per the above, does this require inclusion of goodness-of-fit / discrimination metrics for the resulting PS in subsequent reports to "validate" selected features? I can't imagine this being helpful to all but the most esoteric amongst us.
- If the above are true, shouldn't we also consider more advanced algorithms such as neural nets* for PS specification? This would also alleviate some of the correlation / selection issues from above, but it does not seem to be a popular method.
Ultimately, if the resulting population is balanced prognostically does a better PS specification matter? PS matching seems to occupy a gray space as related to its use of regression. I've been content to explain away some of this incongruity as "prediction is not the goal, controlling for confounding is" and put it to bed with the larger "Explain v. predict" conversation. I'm simply trying to identify the most robust process without being superfluous.
*Please don't mistake me for someone trying to throw a neural net at a simple regression problem.
Best Answer
The goal of propensity score matching (PSM) is to adjust for confounding by achieving covariate balance on a sufficient set of covariates required to nonparametrically identify the causal effect. Covariate balance is the degree to which the treatment is independent of the covariates, or, equivalently, how similar the covariate distributions are between the treatment groups. The set of variables required to nonparametrically identify the causal effect (i.e., a sufficient adjustment set) is a theoretical matter that cannot be decided by statistical modeling and requires the use of substantive beliefs about the relationship among the treatment, outcome, and covariates. I discuss some of that here.
Propensity scores and the models used to estimate them are not to be interpreted and therefore should not be parsimonious. Unlike most prediction tasks, propensity score estimation is not about achieving the best accuracy in predicting the probabilities of class membership; rather, it is about finding propensity scores that yield the best balance. Therefore, propensity score models should not be evaluated on their predictive performance but rather on their ability to achieve balance.
There are many ways to estimate propensity scores, and none can be known to be superior out the outset; the best one is the one that achieves the best balance, so many should be tried. It may be that an elastic net propensity score model yields the best balance, but the fact that such a model is performing variable selection is irrelevant. It does not tell you anything about which variables need to be controlled for and balanced by the matching. It is solely one of many possible propensity score models. A variable being selected out of the final propensity score model is not a variable that no longer needs to be adjusted for; it is just a variable that, when removed, yields the best-performing model. When there are very many covariates and a small treatment group, it is often the case that the best-performing models will involve regularization or variable selection. But the results of such models do not inform on any substantive issue related to the problem at hand.
With this in mind, I will answer your four questions:
cobalt
documentation for good balance metrics to report.MatchIt
, so it is available. It is also possible to use stacking methods like Superlearner to compute propensity scores (Alam et al., 2019). For some problems, simple models like logistic regression may perform well; for others, more complex models or models that involve regularization or variable selection may perform better. Some matching methods don't even involve the propensity score, like cardinality matching and coarsened exact matching (both implemented inMatchIt
).Alam, S., Moodie, E. E. M., & Stephens, D. A. (2019). Should a propensity score model be super? The utility of ensemble procedures for causal adjustment. Statistics in Medicine, 38(9), 1690–1702. https://doi.org/10.1002/sim.8075
Austin, P. C. (2009). Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Statistics in Medicine, 28(25), 3083–3107. https://doi.org/10.1002/sim.3697
Collier, Z. K., Leite, W. L., & Zhang, H. (2021). Estimating propensity scores using neural networks and traditional methods: A comparative simulation study. Communications in Statistics - Simulation and Computation, 0(0), 1–16. https://doi.org/10.1080/03610918.2021.1963455