Experiment Design – Comparing CUPED and Difference-in-Difference Methods

difference-in-differenceexperiment-designregression

I've recently come across a method known as "Controlled-experiment Using Pre-Existing Data" (CUPED), which is a regression adjustments technique to create unbiased estimators in experiments. Likewise, I have prior familiarity with difference-in-differences (DiD).

As I understand it, DiD makes the assumption that treatment groups and control groups are changing at the same rate; the means of treatment and control might differ before the experiment begins. However, we're interested in the change in the treatment group (while accounting for the existing trend) and the change in the control group (only the existing trend).

CUPED seems to accomplish much the same thing. But I've heard that instead of modeling the difference in trends, it tries to account for trends such that in the absence of treatment exposure, the treatment group is corrected to match the control group. And this adjustment carries through the experiment such that treatment and control groups could be comparable directly. Is any of this true?

Further, I have a few questions:

  1. Does CUPED assume non-zero trends exist in both treatment and control groups?
  2. If [1], does CUPED assume the same trend?
  3. Does CUPED's adjustment require data from pre- and post-experiment start (like DiD)? Or does CUPED instead use purely pre-experiment panel data to make the adjustment?
  4. Do both treatment and control groups receive an adjustment?

Best Answer

i'm try to answer this question from a practitioner's perspective: in A/B testing, or online controlled experiment, when people mention CUPED, or DID, what are they actually doing and the assumptions reflected in these practices.

for simplicity, we start from a simulated data:

  • y: response variable, or post-experiment data
  • t: assign variable, indicating whether it is a treat group or a control group
  • x: pre-experiment variable, usually is the value of y in pre-experiment period
library(tidyverse)
options(pillar.sigfig = 6)
options(digits = 6)

set.seed(0)
N = 10000 # sample size
p = 0.5 # treat group ratio
tau = 0.2 # true treat effect
theta = 0.6 # correlation between x and y
t = ifelse(runif(N) > p, 1, 0) # assign variable
x = rnorm(N) # pre-experiment variable
y = theta * x + t * tau + rnorm(N) # response variable
df = data.frame(y, t, x)

Diff-in-Diff

in practical A/B testing, some people do use DID because it seems intuitive, although many other people criticize this method as being inappropriate in the context of randomized experiments.

here is summary stat, the raw delta y is 0.197327 - 0.00625198 = 0.191075, pre-exist delta x is -0.00297056 - 0.0151793 = -0.0181499, close but not 0.

df |>
  group_by(t) |>
  summarise(
    n = n(),
    my = mean(y),
    mx = mean(x)
  )
      t     n         my          mx
  <dbl> <int>      <dbl>       <dbl>
1     0  5040 0.00625198  0.0151793 
2     1  4960 0.197327   -0.00297056

DID correct the pre-exist delta x by this: (0.197327 - 0.00625198) - (-0.00297056 - 0.0151793) = 0.209225.

here is so called DID model, also as change score model. the estimate of t is the same as the value we calculated manually before

did_mod = summary(lm(y ~ t, df, offset = x)) # or summary(lm(y - x ~ t, df))
did_mod
did_mod$coef['t', 'Estimate']
    did_mod$coef
               Estimate Std. Error   t value    Pr(>|t|)
(Intercept) -0.00892729  0.0151799 -0.588099 5.56479e-01
t            0.20922526  0.0215540  9.707023 3.52553e-22

in DID model, if without treat effect, y = x + intercept, here intercept is time effect, which is constant for all.

CUPED

the main idea of CUPED is:

  • compute $\theta$

$$\theta = \frac{\operatorname{cov}(Y, X)}{\operatorname{var}(X)} = \operatorname{corr}(X, Y) \cdot \operatorname{var}(Y) = \rho \cdot \operatorname{var}(Y)$$

$\theta$ turns out to be the ordinary least square (OLS) solution of regressing Y on X

theta = cov(df$y, df$x) / var(df$x)
theta
summary(lm(y ~ x, df))$coef['x', 'Estimate']
[1] 0.594981
[1] 0.594981
  • compute adjusted $Y_i^{cv} = Y_i - (X_i - \mu_X) \cdot \theta$ for each user
mx = mean(df$x)
df$y_cv = df$y - theta * (df$x - mx)
  • evaluate the A/B testing using $Y_i^{cv}$ instead of $Y_i$
cuped_mod = summary(lm(y_cv ~ t, df))

the result cuped-adjusted estimate treat effect is

$$\tau = (\overline Y_1 - \theta \cdot (\overline X_1 - \mu_X)) - (\overline Y_0 - \theta \cdot (\overline X_0 - \mu_X)) \\= (\overline Y_1 - \overline Y_0) - \theta \cdot (\overline X_1 - \overline X_0) $$

in our case, $\tau = 0.197327 - 0.00625198 - (-0.00297056 - 0.0151793) * \theta = 0.201874 $, the same as below

cuped_mod = summary(lm(y_cv ~ t, df))
cuped_mod$coef
cuped_mod$coef['t', 'Estimate']
               Estimate Std. Error    t value    Pr(>|t|)
(Intercept) 0.000895777  0.0140728  0.0636532 9.49248e-01
t           0.201874228  0.0199820 10.1028043 6.98895e-24
[1] 0.201874

DID vs CUPED

  • the only difference is $\theta$, in DID, $\theta$ fixed to 1, in CUPED, $\theta = \frac{\operatorname{cov}(Y, X)}{\operatorname{var}(X)}$, there are also other calculation methods.

  • because of the difference in $\theta$, the standard errors of the CUPED and DID estimates are different. Generally, the cuped estimate is more accurate.

  • because of the difference in $\theta$, there are also slight differences in the treatment effects estimated by CUPED and DID. refer to (Best practice when analysing pre-post treatment-control designs)[https://stats.stackexchange.com/questions/3466/best-practice-when-analysing-pre-post-treatment-control-designs]

Back to your question

Does CUPED assume non-zero trends exist in both treatment and control groups? If [1], does CUPED assume the same trend?

yes. $\theta$ is common for both treatment and control groups. But I think we can also estimate $\theta$ in the experimental group and the control group separately.

Does CUPED's adjustment require data from pre- and post-experiment start (like DiD)? Or does CUPED instead use purely pre-experiment panel data to make the adjustment? Do both treatment and control groups receive an adjustment?

CUPED's adjustment require data from pre- and post-experiment start and both treatment and control groups receive an adjustment