Causal Inference – Difference Between Propensity Score Matching and CausalImpact

causalimpactcausalityobservational-studypropensity-scorestime series

I'm investigating causal effect in some financial data, and I'm using two different approaches: propensity score matching with stratification and the CausalImpact package for Bayesian structural time series. Theoretically, should propensity score matching and CausalImpact give similar results? I know there are differences in methodologies, but let's assume the appropriate data is used, so for one we use features for individuals in the treatment/control group, and the other we use an aggregate time series for the treatment group along with several covariate time series.

My concern lies in how counterfactuals are computed for the treated group. In propensity score matching with stratification, the treated population is split into bins, and counterfactuals are calculated per bin and then combined with a weighted average. For CausalImpact, a single counterfactual is predicted on the whole treatment group's time series. Could this be problematic? For instance, maybe it's easier to predict counterfactuals per bin than the entire group at once, or maybe there's a Simpson's paradox-type phenomenon where we observe positive causal effect per bin but not in the whole time series, so the two methodologies would give different results. Are these valid concerns, or should propensity score matching and CausalImpact always give similar predictions?

Best Answer

They are doing very similar things, so you should expect to see the same answer. In PSM the counterfactual for each treated observation is weighted average of untreated observations post-treatment. CausalImpact constructs the counterfactual to the observed post-intervention time series using a combination of all the control time series you feed it. If you have $N_C$ time series for the $N_C$ untreated units, you could use all $N_C$ of them as predictors. You can then plot the posterior probability of each being included in the model to see which ones were important (analog of being in the same PS bin).

An even better way to approximate what PSM is doing would be to take each treated time series (rather than aggregating to a single one) and do an analysis using all the untreated time series as predictors. Then average the cumulative effect for all the treated units and compare that to the cross-sectional PSM estimate. Note that you can include lagged outcomes as pre-treatment variables to match on (assuming that lagged pre-treatment outcomes are not altered in expectation of treatment), so this will approximate what CI is doing even more closely since you are using the time series variation in a similar way.

Both methods also make similar assumptions about SUTVA, which might be hard to swallow in many financial settings.