Econometrics – Interpreting Causal Effect from Instrumental Variables in Experiment Design

causalityeconometricsexperiment-design

We ran an experiment in which users could optionally watch a video. The video is intended to drive some other metric, $y$.

Because the intervention (video) was optional, choosing to watch the video is confounded by omitted variables (most probably being the kind of person to watch instructional videos). Since treatment was randomized, we can use it as an instrument to estimate the effect of watching the video on $y$.

Our data and analysis are shown below

library(tidyverse)
library(ivreg)

d <-data.frame(
  treatment = c(0, 0, 1, 1, 1, 1),
  clicked = c(0, 0, 0, 0, 1, 1),
  y = c(0, 1, 0, 1, 0, 1),
  n = c(3008L, 2075L, 2779L, 2038L, 145L, 74L)
) %>% 
  uncount(n)


mfit <- ivreg(y ~ clicked | treatment, data=d)
mfit
#> 
#> Call:
#> ivreg(formula = y ~ clicked | treatment, data = d)
#> 
#> Coefficients:
#> (Intercept)      clicked  
#>      0.4082       0.2566

The causal estimate of watching the video is 0.26. Because we typically report these effects on the relative scale, I need to know the estimated rate $y$ when clicks do not happen. The way I approached this was simply to use the predict method on mfit to get estimated rates when clicked=0,1.

This results in $E[y] = 0.67$ for users who clicked, and $E[y]=0.41$ for users who did not click.

My colleague points out that the dat a do not support this interpretation. Examining users who only clicked on the video, the estimated rate of $y$ is 0.34

d %>% 
  filter(clicked==1) %>% 
  summarise(
    mean(y)
  )
    mean(y)
1 0.3378995

Question

I believe I am wrong to have used the predict method to estimate the rate of $y$ when clicked=0. This is because to estimate that rate, I would need the unmeasured confounders. So while I can estimate the effect of the click onk $y$, I can not estimate $E[y\mid \mbox{clicked}]$.

Am I correct? If not, how can I reconcile the difference in estimated rates between the model and the data?

Edit

After some more thinking, i'm inclined to say it would be impossible to estimate the relative improvement for users who clicked.

The intercept in the ivreg call is the same as the marginal mean of the control group. However, that estimate is comprised of an unknown proportion of compliers and non compliers.

Even in the case where we maybe knew the propotion of compliers, the estimated rate of $y$ for compliers is unknown because it could depend on unmeasured confounders. So we can't know the other term for our causal contrast, we can only know the LATE from using the instrument. Which is a fine answer, I just need to know for certain it isn't possible.

Best Answer

I think in an A/B test like this, where you have an encouragement design with one-sided non-compliance, you can make some progress under reasonable assumptions.

Using the notation from here, where C stands for compliers and not clickers, the LATE identified by IV is $$\Delta_{IV} =\frac{E(Y_1 \vert C) \cdot Pr(C)−E(Y_0 \vert C) \cdot Pr(C)}{Pr(C)}=E(Y_1 \vert C)−E(Y_0 \vert C)$$

Putting this into relative terms means doing this:

$$\%\Delta_{IV}=\frac{E(Y_1 \vert C)−E(Y_0 \vert C)}{E(Y_0 \vert C)}.$$

The issue is that you don't know the denominator. Assuming treatment is randomized, in the control group:

$$\require{cancel} E(Y_0) = \cancel{E(Y_0 \vert AT)\cdot Pr(AT)}+E(Y_0 \vert C)\cdot Pr(C)+ \cancel{E(Y_0 \vert DF) \cdot Pr(DF)}+E(Y_0 \vert NT) \cdot Pr(NT)$$

Here always-takers go away since control users cannot click. Ditto for defiers. This is different from the typical labor economics experiment, where people can take up job training somewhere else even if they are in the control group. So the LATE = ATT. You also know that your treatment group is compliers, who all click, and never-takers, who don't. This allows you to separate the two groups cleanly. The same logic applies to the control group, since the types are fixed. The outcome for the never-takers should be the same in treatment and control as long as the video is the only channel by which the treatment can change the outcome. This rules out behavior like control users getting pissed about being denied the video and reducing purchases.

But if you are willing to make these reasonable assumptions, you can back out the share of never-takers in the treatment group (96%) and the mean untreated outcome for them (0.42). You can also get the share of compliers in the treated group (4%), which should be the same as in the control. You can then calculate the mean of the untreated outcome for never-takers and compliers together in control (0.41). That should be enough to pin down the mean of the untreated outcome in control (.08), which should be the same in treatment. That should give you a relative lift of $\frac{0.26}{.08} \approx 3.15X$. This is pretty large, but not stag sig. Your first stage is strong, so this is probably not a weak instrument artifact.

Issues of statistical significance aside, this result implies that you have very low take-up, but an enormous effect for those who do it. You may want to explore making take-up easier (product changes making the video more prominent, like screen takeovers, subsidies for watching, etc.). You can also try to fit a model for $\Pr(Complier \vert X)$. Maybe all the compliers are new users, so the strategy above is limited by the inflow of new users.

Standard errors are a bit trickier but can bootstrap the IV regression plus complier arithmetic jointly.