Inferences on ratio of branch means in randomized experiment

causalityinferenceratioregressiontreatment-effect

It's generally well known that the difference of means is an unbiased estimator of the Average Treatment Effect in randomized experiments: $\mathbb{E}[Y|A=1]-\mathbb{E}[Y|A=0]$ is unbiased for $\mathbb{E}[Y(1) – Y(0)]$ where $A\in{0,1}$ indicates treatment branch and $Y(1),Y(0)$ are the potential outcomes under treatment and control, respectively (potential outcomes framework). As a result, confidence intervals for the ATE can be constructed using standard t tests.

It's also well known that this is equivalent to a linear regression with a single indicator variable: inferences on $\hat{\delta}$ in the following model: $y_i=\alpha + \delta\cdot t_i$ where $t_i\in{0,1}$ is the treatment indicator for the $i$-th unit.

I am interested in inferences on relative differences in means in a randomized experiment:

$$f(Y_t,Y_c)=\frac{\bar{Y}_t – \bar{Y}_c}{\bar{Y}_c}=\frac{\bar{Y}_t}{\bar{Y}_c}-1$$

We can construct confidence intervals for $f$ either analytically (an application of Fieller's Theorem ) or through bootstrapping.

It's also known that covariate adjustment (the addition of prognostic or baseline variables) can improve the precision of the ATE estimate from the OLS model. This is commonly used in online experimentation platforms in industry.

It's easy to see how covariate adjustment improves the estimate of the ATE, that is, of the difference in means.

My question is: is there a way to benefit from covariate adjustment for inferences on the relative difference (statistic $f$, above)? Can you, for example, draw inferences on $f(Y_t^{cv},Y_c^{cv})$ using bootstrapping, where $Y_t^{cv},Y_c^{cv}$ are the CUPED-adjusted metric values.

Best Answer

The added precision from covariate adjustment comes from the fact that some of the variation in the outcome may be due to variation in the adjustment variables. When this is the case, adjusting for those variables in OLS is the same as adjustments done from CUPED (perhaps better considering CUPED does not seem to alter the degrees of freedom used for the test statistic).

If you're willing to use post-estimation techniques, like a marginal effect, then obtaining confidence intervals for the quantity $f$, sometimes called the excess risk ratio, should be straight forward.

Let's do a little example in R. Let's do one adjustment covariate and a randomized exposure, not unlike what you might see in experiments in industry.

set.seed(0)
n <- 500
trt <- rep(0:1, each=500)
x <- rnorm(n)
y <- 2*x + 1 + 0.5*trt + rnorm(n, 0, 0.3)

The ATE in this example is 0.5, and the expected value for the potential outcome under no treatment is 1. This means we should get an estimate of the excess risk close to 0.5.

Let's fit a model and use marginaleffects to estimate this quantity with a 95% CI. marginaleffects can estimate something closely related to the excess risk by passing comparison = "lnratioavg" and transform=exp to the avg_comparisons function. This will actually estimate

$$ \tau = \ln(E(Y(1)) - \ln(E(Y(0)) $$

and then return estimates for $\exp(\tau)$ which is close enough for our purposes. Let's do this in code.

library(marginaleffects)

d <- data.frame(x, trt, y)
fit <- lm(y~trt + x, data=d)

avg_comparisons(
  fit,
  variables = 'trt',
  comparison = 'lnratioavg',
  transform = exp
)
 Term              Contrast Estimate Pr(>|z|)     S 2.5 % 97.5 %
  trt ln(mean(1) / mean(0))      1.5   <0.001 453.8  1.46   1.55

Columns: term, contrast, estimate, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
Type:  response 

We get an estimate of $\exp(\tau) = 1.5$ which is exactly where it should be. We also get a confidence interval for this quantity. Let's refit the model and see how much wider the CI is when we don't adjust for x.

fit <- lm(y~trt, data=d)

avg_comparisons(
  fit,
  variables = 'trt',
  comparison = 'lnratioavg',
  transform = exp
)

 Term              Contrast Estimate Pr(>|z|)    S 2.5 % 97.5 %
  trt ln(mean(1) / mean(0))      1.5   <0.001 12.6  1.22   1.86

Columns: term, contrast, estimate, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
Type:  response 

The CI is much wider when we don't adjust for x, as expected. So this is how we can practically get a CI for the excess risk, and we get to benefit from covariate adjustment too.

Bootstrapping will work in this scenario as well. Here is an example:

f <- function(d, idx){
  fit <- lm(y~x + trt, data=d[idx, ])
  
  
  ey0 <- predict(fit, newdata = list(x=0, trt=0))
  ey1 <- predict(fit, newdata = list(x=0, trt=1))
  
  
  (ey1 - ey0) / ey0
}


result <- boot::boot(d, f, 1000)

boot::boot.ci(result)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates

CALL : 
boot::boot.ci(boot.out = result)

Intervals : 
Level      Normal              Basic         
95%   ( 0.4565,  0.5539 )   ( 0.4566,  0.5553 )  

Level     Percentile            BCa          
95%   ( 0.4543,  0.5529 )   ( 0.4545,  0.5530 )  
Calculations and Intervals on Original Scale
Warning message:
In boot::boot.ci(result) :
  bootstrap variances needed for studentized intervals

The bootstrap results look fairly similar to the marginal effect estimates.

So in short, you can just use regression and a marginal effect in most cases. Bootstrapping is a good idea too.