Logistic Regression – Variables to Include After Obtaining a Minimally Sufficient Set from a DAG

causalitydagepidemiologylogistic

I have created a DAG using daggity, and from this DAG, two variables need to be controlled to evaluate the unbiased total effect of the exposure on the outcome. However, I'm confused about whether I should only include the exposure, outcome, and the two variables that appear to be confounders in my logistic regression model? I understand that "the minimally sufficient adjustment set is the list of DAG elements that require adjustment (e.g., using regression, matching, or weighting) in order to accurately estimate the magnitude of the relationship between an exposure and an outcome", in this case, I can "adjust" this two variables (sex and age) by including them in the model in addition with my exposure and outcome?
Thanks!

Best Answer

Yes, if your DAG is correct and age/sex is the minimal adjustment set, you can include those variables in your logistic regression as controls. BUT:

  • Without knowing your research problem, it's impossible to judge the plausibility of your DAG, but generally observational studies including only demographic 'covariates of convenience' are unlikely to provide credible causal estimates. Does your theory really warrant a DAG that has only age/sex as confounders? Or are there likely to be other that you haven't considered.

  • With logistic regression, controlling for other factors changes the estimand from a marginal odds ratio to a conditional odds ratio, which is often not desirable - the marginal odds ratio is more commonly what you're after, and conditional odds ratios may be (much) further from 1 than the marginal odds ratio. Usually, I would suggest back transforming to average marginal effects in percentage points, or you could consider using IPTW (instead of regression control) to estimate the marginal odds ratio with logistic regression.