Logistic – Analyzing Marginal Effects in Logistic Regression and Subgroup Analysis

causalityinteractionlogisticmarginal-effectmatching

I am conducting a research project implementing exact matching to seek to isolate a marginal effect of a binary exposure variable on a binary outcome (via MatchIt in R). I am using logistic regression as the model after matching, however, I feel confused about some aspects of the analyses.

  1. Is it correct to say that by using logistic regression (with covariates to avoid confoundedness) I cannot estimate the average marginal effect, but only the conditional effect? If so, is it correct to say that, for the exposure, the conditional effect, e.g., in the form of ATE, ATT, ATU, etc., is XX?

  2. Given that my matched dataset following the exact matching procedure discarded some treated units, is it correct to say that I cannot estimate the ATT, but instead, I can only try to isolate the ATM (Average Treatment on the Matched)? If so, when computing the conditional/marginal effect via:

     comparisons(model_matched, variables="exposure", newdata = subset(matched_dataset, exposure== 1))|> summary()
    

am I computing the average treatment on the treated in the matched sample (which we can call ATTM)? Or should we call it differently?

  1. I am also confused concerning subgroup analysis. To understand whether the impact of exposure varies based on different levels of a specific categorical covariate, should I use comparisons in the form:

     comparisons(model_matched, variables="exposure", by="covariate", newdata = subset(matched_dataset, exposure == 1))|> summary()
    

or should I compute interactions via:

comparisons(model_matched, variables=c("exposure", "covariate"), newdata = subset(matched_dataset, exposure == 1))|> summary()

Thanks a lot for the patience and help!

Best Answer

  1. The coefficient on the treatment variable in a logistic regression with covariates included can be interpreted as a conditional treatment effect. However, it is only meaningful if the outcome model is correct, i.e., there is no effect modification on the odds ratio scale by the covariates, and the log odds of the outcome is linear in the covariates and treatment. If those assumptions are true, then there is no benefit to matching. If they are false, it is ambiguous how to interpret the coefficient on treatment.

  2. The code you produced does indeed compute the average treatment effect in the treated (ATT) in the matched sample. If the covariates are well balanced, there is no difference between the covariate distributions in the treated group and in the full sample, in which case this effect is equal to the average treatment effect (ATE) in the full matched sample. In this case, I usually just compute the ATE in the full matched sample, but it should not make a difference in practice unless the treatment groups are poorly balanced, in which case you should go back to matching anyway or risk extrapolation. You should make sure this choice is consistent with the estimand you chose in the call to matchit(); if you requested ATT weights with exact matching, you should estimate the ATT in the matched sample; if you requested ATE weights, you should estimate the ATE in the matched sample. Note that you should include the matching weights in the call to comparisons() as instructed in the MatchIt vignette on estimating effects.

  3. You should use the first line of code. The second counterfactually sets the values of the subgrouping variable and estimates the effect over the full sample under each value; the first estimates the effect in each subgroup. Counterfactually setting the subgrouping variables to values does not allow the covariate distribution to naturally vary along with the moderator, so you can get treatment effects in impossible combinations of covariates, which require extrapolation.