Solved – From a statistical perspective, can one infer causality using propensity scores with an observational study

causalitypropensity-scores

Question: From the standpoint of statistician (or a practitioner), can one infer causality using propensity scores with an observational study (not an experiment)?

Please, do not want to start a flame war or a fanatical debate.

Background: Within our stat PhD program, we've only touched on causal inference through working groups and a few topic sessions. However, there are some very prominent researchers in other departments (e.g. HDFS, Sociology) who are actively using them.

I've already witnessed some pretty heated debate on this issue. It is not my intention to start one here. That said, what references have you encountered? What viewpoints do you have? For example, one argument I've heard against propensity scores as a causal inference technique is that one can never infer causality due omitted variable bias — if you leave out something important, you break the causal chain. Is this an unresolvable problem?

Disclaimer: This question may not have a correct answer — completely cool with clicking cw, but I'm personally very interested in the responses and would be happy with a few good references which include real-world examples.

Best Answer

At the beginning of an article aiming at promoting the use of PSs in epidemiology, Oakes and Church (1) cited Hernán and Robins's claims about confounding effect in epidemiology (2):

Can you guarantee that the results from your observational study are unaffected by unmeasured confounding? The only answer an epidemiologist can provide is ‘no’.

This is not just to say that we cannot ensure that results from observational studies are unbiased or useless (because, as @propofol said, their results can be useful for designing RCTs), but also that PSs do certainly not offer a complete solution to this problem, or at least do not necessarily yield better results than other matching or multivariate methods (see e.g. (10)).

Propensity scores (PS) are, by construction, probabilistic not causal indicators. The choice of the covariates that enter the propensity score function is a key element for ensuring its reliability, and their weakness, as has been said, mainly stands from not controlling for unobserved confounders (which is quite likely in retrospective or case-control studies). Others factors have to be considered: (a) model misspecification will impact direct effect estimates (not really more than in the OLS case, though), (b) there may be missing data at the level of the covariates, (c) PSs do not overcome synergistic effects which are know to affect causal interpretation (8,9).

As for references, I found Roger Newson's slides -- Causality, confounders, and propensity scores -- relatively well-balanced about the pros and cons of using propensity scores, with illustrations from real studies. There were also several good papers discussing the use of propensity scores in observational studies or environmental epidemiology two years ago in Statistics in Medicine, and I enclose a couple of them at the end (3-6). But I like Pearl's review (7) because it offers a larger perspective on causality issues (PSs are discussed p. 117 and 130). Obviously, you will find many more illustrations by looking at applied research. I would like to add two recent articles from William R Shadish that came across Andrew Gelman's website (11,12). The use of propensity scores is discussed, but the two papers more largely focus on causal inference in observational studies (and how it compare to randomized settings).

References

  1. Oakes, J.M. and Church, T.R. (2007). Invited Commentary: Advancing Propensity Score Methods in Epidemiology. American Journal of Epidemiology, 165(10), 1119-1121.
  2. Hernan M.A. and Robins J.M. (2006). Instruments for causal inference: an epidemiologist's dream? Epidemiology, 17, 360-72.
  3. Rubin, D. (2007). The design versus the analysis of observational studies for causal effects: Parallels with the design of randomized trials. Statistics in Medicine, 26, 20–36.
  4. Shrier, I. (2008). Letter to the editor. Statistics in Medicine, 27, 2740–2741.
  5. Pearl, J. (2009). Remarks on the method of propensity score. Statistics in Medicine, 28, 1415–1424.
  6. Stuart, E.A. (2008). Developing practical recommendations for the use of propensity scores: Discussion of ‘A critical appraisal of propensity score matching in the medical literature between 1996 and 2003’ by Peter Austin. Statistics in Medicine, 27, 2062–2065.
  7. Pearl, J. (2009). Causal inference in statistics: An overview. Statistics Surveys, 3, 96-146.
  8. Oakes, J.M. and Johnson, P.J. (2006). Propensity score matching for social epidemiology. In Methods in Social Epidemiology, J.M. Oakes and S. Kaufman (Eds.), pp. 364-386. Jossez-Bass.
  9. Höfler, M (2005). Causal inference based on counterfactuals. BMC Medical Research Methodology, 5, 28.
  10. Winkelmayer, W.C. and Kurth, T. (2004). Propensity scores: help or hype? Nephrology Dialysis Transplantation, 19(7), 1671-1673.
  11. Shadish, W.R., Clark, M.H., and Steiner, P.M. (2008). Can Nonrandomized Experiments Yield Accurate Answers? A Randomized Experiment Comparing Random and Nonrandom Assignments. JASA, 103(484), 1334-1356.
  12. Cook, T.D., Shadish, W.R., and Wong, V.C. (2008). Three Conditions under Which Experiments and Observational Studies Produce Comparable Causal Estimates: New Findings from Within-Study Comparisons. Journal of Policy Analysis and Management, 27(4), 724–750.