Regression Adjustment – Estimating and Interpreting the ATT using Regression Adjustment and Marginal Effects

causalitymarginal-effectregressiontreatment-effect

I am beginning a project that will employ regression/covariate adjustment to estimate the average effect of treatment on the treated (ATT) and I realize that I have two questions concerning how one estimates the ATT in such a setting and how one interprets the regression output when using GLMs.

First, I am slightly confused on the specification of a desired treatment effect under a regression adjustment framework. For example, in alternative strategies, such as matching or weighting, syntax for executing these methods typically supports an explicit argument where one specifies the desired treatment effect: effect = ate, qoi = att, something along these lines. However, in a standard regression formula in R y ~ x1 + x2 + ..., data = data I do not know how to effectively specify the argument for the treatment effect that I want to estimate. By default, does regression adjustment estimate the ATE? If so, how does one modify this?

Second, after reading papers by Mood 2010, Hanmer and Kalkan 2012, and Norton and Dowd 2018, it is apparent to me that, when modeling non-continuous outcomes, regression coefficients, odds ratios, IRRs, hazard ratios, etc. may prove problematic in interpretation. One solution is to estimate average marginal effects (AMEs). This leads me to my second question. Suppose that I estimate the ATT for a treatment on a count outcome. I then estimate the AME. Can I effectively interpret this AME as the ATT, or are they fundamentally different quantities of interest?

Best Answer

The method to estimate representative treatment effects using regression is called g-computation and works with any outcome type as long as the effect measure can be specified as a contrast between means (e.g., a mean difference, a ratio between marginal probabilities, a ratio between marginal odds, etc.). Here's how this works:

Fit a model for the outcome. Ideally this is a flexible model that includes interactions between the treatment and covariates.
Generate predicted values from this model setting all units' treatment value to "treated"
Generate predicted values from this model setting all units' treatment value to "control"
Compute the mean of the predictions under treatment (2) and the mean of the predictions under control (3)
Compute a contrast between these two means.

This method of g-computation estimates the ATE. To estimate the ATT, steps 2 and 3 should be done using only the treated units. The control units are still used to fit the model in 1, but only the treated units are used to compute the predicted values.

To get standard errors, you can use bootstrapping or the delta method (the latter of which is exactly accurate when the outcome model is linear and the contrast is the difference in means but only an approximation otherwise).

In R, this is really easy using the marginaleffects package:

#Fit the outcome model
fit <- glm(Y ~ A * (X1 + X2 + X3), data = data)

#Generate predictions and contrast them
avg_comparisons(fit, variables = "A",
                newdata = subset(data, A == 1))

This works for any GLM, e.g., logistic regression, Poisson regression etc. To compute contrasts that aren't the difference in means/risk difference, just supply an argument to comparison and transform (e.g., to get the risk ratio/relative risk, you would set comparison = "lnratioavg", transform = "exp").

This quantity is related to an AME, though that term is a bit ambiguous because of the multiple meanings of the word "marginal". The word "marginal" in AME means the instantaneous rate of change when the predictor is changed by a tiny amount. For a binary predictor, we are not changing it by a tiny amount; we are going from 0 to 1 (or whatever values you have). So AME is not an accurate way to describe this contrast, though I often use it because it is very closely related in computation and concept to a true AME. Rather, this is a "contrast between the average adjusted predictions". Kind of a mouthful.

Related Solutions

Solved – The difference between average and marginal treatment effect

As some of the information you provided states, the two are not the same. I like better the terminology of conditional (on covariates) and unconditional (marginal) estimates. There is a very subtle language problem that clouds the issue greatly. Analysts who tend to love "population average effects" have a dangerous tendency to try to estimate such effects from a sample with no reference to any population distribution of subject characteristics. In this sense the estimates should not be called population average estimates but instead should be called sample average estimates. It is very important to note that sample average estimates have a low chance of being transportable to the population from which the sample came or in fact to any population. One reason for this is the somewhat arbitrary selection criteria for how subjects get into studies.

As an example, if one compared treatment A and treatment B in a binary logistic model adjusted for sex, one obtains a treatment effect that is specific to both males and females. If the sex variable is omitted from the model, a sample average odds ratio effect for treatment is obtained. This in effect is a comparison of some of the males on treatment A with some of the females on treatment B, due to non-collapsibility of the odds ratio. If one had a population with a different female:male frequency, this average treatment effect coming from a marginal odds ratio for treatment, will no longer apply.

So if one wants a quantity that pertains to individual subjects, full conditioning on covariates is required. And these conditional estimates are the ones that transport to populations, not the so-called "population average" estimates.

Another way to think about it: think of an ideal study for comparing treatment to no treatment. This would be a multi-period randomized crossover study. Then think about the next best study: a randomized trial on identical twins where one of the twins in each pair is randomly selected to get treatment A and the other is selected to get treatment B. Both of these ideal studies are mimicked by full conditioning, i.e., full covariate adjustment to get conditional and not marginal effects from the more usual parallel group randomized controlled trial.

Causal Inference – Using Inverse Probability of Treatment Weighting (IPTW) for Multiple Treatments

You'll want to check out McCaffrey et al. (2013) for advice on this, not Austin & Stuart (2015), which is for binary treatments only. It's not clear to me which causal estimand you want, so I'll explain how to get weights for both.

The ATE for any pair of treatments is the effect of moving everyone from one treatment to the another. In your example, one ATE would be the effect of moving the entire population from A to B, while another might be the effect of moving the entire population from B to D.

To estimate ATE weights, you take the inverse of the estimated probability of being the group actually assigned. So, for an individual in group A, their weight would be $w_{ATE,i}=\frac{1}{e_{A,i}}$. More generally, the weights are $$w_{ATE,i} = \sum_{j=1}^p{\frac{I(Z_i=j)}{e_{j,i}}}$$ where $j$ indexes treatment group, $I(Z_i=j)=1$ if $Z_i=j$ and $0$ otherwise, and $e_{j,i}=P(Z_i=j|X_i)$.

The ATT involves choosing one group to be the "treated" or focal group. Each ATT is a comparison between another treatment group and this focal group for members of the focal groups. If we let group B be the focal group, one ATT is the effect of moving from A to B for those in group B. Another ATT is the effect of moving from D to B for those in group B.

The weights for the focal group are equal to 1, and the weights for the non-focal group are equal to the probability of being in the focal group divided by the probability of being the group actually assigned. So, $$w_{ATT(f),i} = I(Z_i=j)+e_{f,i}\sum_{j \ne f}^p{\frac{I(Z_i=j)}{e_{j,i}}}= e_{f,i} w_{ATE,i}$$ where $f$ is the focal group. So, just as in the binary ATT case, the ATT weights are formed by multiplying the ATE weights by the propensity score for the focal group (i.e., the probability of being in the "treated" group). The binary ATT case, the focal group is group 1, so the probability of being in the focal group is just the propensity score.

Note all of these formulas apply to the binary treatment case.

Using WeightIt in R, you would specify

w.out <- weightit(Treatment ~ X1 + X2 + X2, data = data, estimand = "ATT", focal = "B")

to estimate the ATT weights for B as the focal group using multinomial logistic regression. After checking balance (e.g., using cobalt), you can estimate the outcome model as

fit <- glm(Y ~ relevel(Treatment, "B"), data = data, weights = w.out$weights)

You need to make sure the focal group is the reference level of the treatment variable for the coefficients to be valid ATT estimates.

Best Answer

Related Solutions

Solved – The difference between average and marginal treatment effect

Causal Inference – Using Inverse Probability of Treatment Weighting (IPTW) for Multiple Treatments

Related Question