Time-Series – How to Control for Confounders by Matching Based on Outcome Variables

causalityconfoundingmatchingtime series

Suppose that Walmart has 100 stores. It has a coupon for cereal, and it wants to know if the coupon increases cereal sales by a significant amount.

Walmart puts the coupon on the cereal shelf in 10 stores; let's call this the treatment group.

The other 90 stores do not the the coupon; let's call this the control group.

There are all kinds of confounders that affect sales, like

  • the median income of the shoppers who visit the store
  • the number of shoppers who visit the store
  • the number of competing retailers near the store

Because of these confounders, we have to rely on causal inference.

I know that there are 2 common ways in causal inference to deal with situations like this:

  1. controlling for confounders in a regression model
  2. estimating propensity scores using the confounders, then matching each treatment store to a control store with similar propensity scores, and doing a paired t-test

My question is about a possible 3rd method that I learned from casual conversations with other data scientists. Here are the steps.

A. For all 100 stores, find the time series of cereal sales for 1 year before the introduction of the coupon.

B. For each treatment store, find 5 control stores whose cereal sales histories match the treatment store's cereal sales history.

C. Using the matched control stores, build some regression model to predict the counterfactual sales for the treatment stores (i.e. what their cereal sales would have been without the coupon).

D. Calculate the average treatment effect on the treated (ATT) on the treatment stores based on their actual sales and their counterfactual sales.

I am not fully clear on the validity behind this approach. Somehow, the matching process of the sales bypasses the confounding effects. (I don't understand this, and nobody can explain it to me properly. I'm posting this question to gain some clarity on it.)

To the experts on causal inference: Is this approach correct? If so, how exactly does matching on the outcome variable (i.e. sales) overcome the confounding effects?

Best Answer

It seems like you are describing something like a difference-in-differences panel model with matching on pre-treatment trends. A common framework for this is an event study model. Sometimes it can involve matching.

In general, we use matching to build a synthetic counterfactual population. You would try to match on the pre-treatment or time-invariant factors that are most likely to lead to selection or self-selection into treatment. If you can build a counterfactual that is sufficiently similar, you can proceed as if your matched sample is an equivalent to the treatment sample that never received treatment. The researcher can build the case that the matched sample is sufficiently similar by comparing the expected/mean values of important attributes, often through t-tests or similar. If there is no statistically significant difference on the important range of attributes, you have some evidence that your matched sample is a good counterfactual. You can never be totally certain you have identified all key elements, which is why it is not as reliable as a true experiment with randomly assigned treatment.

If you are matching with panel data, stable attributes can be important, but you also need to examine whether your two groups have evidence of "parallel trends" during pre-treatment periods. This is probably what was meant when you heard about "sales history" in building a matched control sample. Two stores might be very similar in size, location, average sales, or a range of attributes, but if they were not exhibiting similar trends in sales before treatment, there is no reason to think they would continue to do it post-treatment, which means the control cases would not serve to show you how the treatment cases would have behaved in absence of the treatment.

Event study models test for "non-parallel pre-treatment trends" by picking one time point, and testing whether the difference-in-differences of other pre-treatment time points are significant. They don't have to be on the same "level" or nominal value for this to be valid. They just have to continue or change in a similar way for each time-point up to the point in time when treatment is applied. It is extremely helpful to know whether treated cases had anticipation of the treatment, too. If treated cases have anticipation, the researcher needs to think about how this would alter pre-treatment levels. It might cause the researcher to reject a decent match sample or accept one that is inappropriate. This isn't perfect and can't eliminate selection effects as reliably as random treatment, but it helps build the case that the counterfactual group is appropriate.

If you have a large pool of potential counterfactual cases, you might use matching on pre-treatment time trends (differences in values from one point to the next) to find better control cases.

Finding a matched sample that is similar in both trend and nominal values ("level heterogeneity") is often difficult. If we have evidence of no non-parallel trends, we can deal with level heterogeneity by calculating ATT as $$ATT=(posttreatment - pretreatment)-(postcontrol - precontrol)$$ This way the ATT ignores the differences in expected nominal value that might exist between treatment and control. In event study models, it is more common to use unit (store) fixed effects that subtract each stores grand/panel mean from each time point, which forces all units/stores onto the same expected nominal value.

The important thing for these models is they are imperfect compared with true random assignment in the research design (often not possible), but they are often the best we can do and are often pretty convincing if the researcher is careful and transparent about the data. Here's a nice resource.

Edit adding answer to OP's useful follow-up question:

In my opinion, you never have certainty that you are removing confounding effects in observational (non-experimental) data. You can do certain things that make confounding effects less likely and results more believable. It would be important to know whether you are matching on "levels" of pre-treatment sales (i.e. avg. values) or on trends of pre-treatment sales. The former adds something but is not enough to trust results (IMO). To me, the latter can be convincing, especially when you test for non-parallel trends in sales and eliminate "level" heterogeneity with store FEs (event study framework).

Related Question