Question: From the standpoint of statistician (or a practitioner), can one infer causality using propensity scores with an observational study (not an experiment)?
Please, do not want to start a flame war or a fanatical debate.
Background: Within our stat PhD program, we've only touched on causal inference through working groups and a few topic sessions. However, there are some very prominent researchers in other departments (e.g. HDFS, Sociology) who are actively using them.
I've already witnessed some pretty heated debate on this issue. It is not my intention to start one here. That said, what references have you encountered? What viewpoints do you have? For example, one argument I've heard against propensity scores as a causal inference technique is that one can never infer causality due omitted variable bias — if you leave out something important, you break the causal chain. Is this an unresolvable problem?
Disclaimer: This question may not have a correct answer — completely cool with clicking cw, but I'm personally very interested in the responses and would be happy with a few good references which include real-world examples.
Best Answer
At the beginning of an article aiming at promoting the use of PSs in epidemiology, Oakes and Church (1) cited Hernán and Robins's claims about confounding effect in epidemiology (2):
This is not just to say that we cannot ensure that results from observational studies are unbiased or useless (because, as @propofol said, their results can be useful for designing RCTs), but also that PSs do certainly not offer a complete solution to this problem, or at least do not necessarily yield better results than other matching or multivariate methods (see e.g. (10)).
Propensity scores (PS) are, by construction, probabilistic not causal indicators. The choice of the covariates that enter the propensity score function is a key element for ensuring its reliability, and their weakness, as has been said, mainly stands from not controlling for unobserved confounders (which is quite likely in retrospective or case-control studies). Others factors have to be considered: (a) model misspecification will impact direct effect estimates (not really more than in the OLS case, though), (b) there may be missing data at the level of the covariates, (c) PSs do not overcome synergistic effects which are know to affect causal interpretation (8,9).
As for references, I found Roger Newson's slides -- Causality, confounders, and propensity scores -- relatively well-balanced about the pros and cons of using propensity scores, with illustrations from real studies. There were also several good papers discussing the use of propensity scores in observational studies or environmental epidemiology two years ago in Statistics in Medicine, and I enclose a couple of them at the end (3-6). But I like Pearl's review (7) because it offers a larger perspective on causality issues (PSs are discussed p. 117 and 130). Obviously, you will find many more illustrations by looking at applied research. I would like to add two recent articles from William R Shadish that came across Andrew Gelman's website (11,12). The use of propensity scores is discussed, but the two papers more largely focus on causal inference in observational studies (and how it compare to randomized settings).
References