# Experiment Design – Calculating Covariate-Adjusted Means and 95%CIs for Treatment and Control Group Separately in a Simple 2-Arm Trial

I have data for a simple 2-arm RCT, that looks as follows:

library(wakefield)

df <-
fabricate(
N = 50,
treatment = draw_binary(prob = 0.5, N = N),
age = age(n = N),
income = income(n = N),
female = draw_binary(prob = 0.5, N = N))


I am estimating the average treatment effect using linear regression:

lm(income ~ treatment + age + female, data = df)


What I want to do is present a coefficient plot that has the covariate-adjusted means, along with 95%CIs, for the treatment and control group plotted separately. Using R, how do I calculate the covariate-adjusted mean and 95%CI when (a) treatment = 1 and (b) treatment = 0, so that the difference between those means gives the same as the estimated average treatment effect with the covariate adjustment?

Its not clear (to me) which covariate-adjusted means you are interested in, because they depend on the values of the other covariates. Of course, the estimated mean difference between treatments will be identical no matter the other covariates. But the means themselves will nevertheless differ depending on whether you look at an, for example, old vs. young person.

So one quick way out is to choose one "typical" person and predict their conditional mean under control and treatment, including a confidence interval:

m <- lm(income ~ treatment + age + female, data = df)
mode_gender <- unique(df$$female)[which.max(tabulate(match(df$$female, unique(df$$female))))] predict(m, newdata = data.frame(treatment = c(0,1), age = median(df$$age),
female = mode_gender),
interval = "confidence")


As you require, here the difference between conditional means is identical with the regression coefficient. You can of course choose characteristics that represent any population rather than a "typical" person. For example, you could choose the full actual sample and estimate its mean (and confidence interval of the mean) under control vs. treatment following your model, thereby estimating the average treatment effect (ATE; in an ideal RCT, this will also be the ATT and ATC and the estimated regression coefficient will be an unbiased estimate of it, no matter if you adjust for other covariates or not - so the only gain from adjustment is potentially increased precision).

Side note:

Linear regression does not in general estimate the ATE (e.g., chapter 6.3, Morgan, S. L., & Winship, C., 2015, Counterfactuals and causal inference. Cambridge University Press.). In your case, however, where regression adjustment is not even necessary to get an unbiased estimate of the ATE (because you have randomization), it does (p. 211):

"Regression estimators with fully flexible codings of the adjustment variables do provide consistent and unbiased estimates of the ATE if either (1) the true propensity score does not differ by strata or (2) the average stratum-specific causal effect does not vary by strata."