Regression – Interpreting LATE in an AB Test for Causal Inference

causalityregression

Suppose we run an AB test wherein users in the treatment are shown some content and users in control are not shown the content. For all intents and purposes, it could be a button. Clicking on the content is optional. After the experiment is done, we can compare treatment and control groups on some binary outcome.

The difference in means here is the effect of offering the content to be clicked, not on clicking the content itself. This is known as the ITT effect. Obviously, we can't just take the users who clicked the content and compare them against control because of selection effects.

However, we can use treatment as an instrumental variable. Angrist and Pischke write in their book Mostly Harmless Econometrics

In many randomized trials, participation is voluntary among those randomly assigned to receive treatment. On the other hand, no one in the control group has access to the experimental intervention. Since the group that receives (i.e., complies with) the assigned treatment is a self-selected subset of those offered treatment, a comparison between those actually treated and the control group is misleading. The selection bias in this case is almost always positive: those who take their medicine in a randomized trial tend to be healthier; those who take advantage of randomly assigned economic interventions like training programs tend to earn more anyway.

[Instrumental variables] using the randomly assigned treatment intended as an instrumental variable for treatment received solves this sort of compliance problem. Moreover, LATE is the effect of treatment on the treated in this case.

If my read on this passage is correct, I can use the treatment assignment as an instrument to estimate the effect of treatment on the treated.

Using R I might be able to do something like ivreg::ivreg(y ~ click | treatment) and interpret the estimate of the coefficient of click as the effect of clicking the content on the outcome y.

Have I understood the use of IV in randomized experiments correctly? If not, what is the interpretation of the click coefficient in this case? Does it have a meaningful interpretation otherwise?

EDIT: I believe the interpretation is "The difference in means between not clicking the content and clicking the content in the group of users who would always click given the option"

Best Answer

I believe the interpretation of the coefficient for click is the effect of clicking the content on the outcome y for those persons who click when given the treatment, but who would not have clicked otherwise.

See point 1 and 4 here: https://egap.org/resource/10-things-to-know-about-the-local-average-treatment-effect/

Interpretation of LATE is also discussed in "Causal Inference: The Mixtape" (Scott Cunningham) in Chapters 6.2.7 and 7.6 and the following reference is given there (havn't yet read the paper myself though):

Imbens, Guideo W., and Joshua D. Angrist. 1994. “Identification and Estimation of Local Average Treatment Effects.” Econometrica 62 (2): 467–75

Related Question