Regression – Dealing with Unmeasured Confounders and Highly Correlated Controls

causalityconfoundinglogisticmatchingregression

I am developing two models using observational data in which I have a binary outcome, a binary treatment and a series of confounders which I control for. The only difference is that the first model uses matched data (via exact matching), while the second uses the whole sample, with no matching.

Now, there's also a variable (to which I do not have access) that some colleagues point out might be a confounder. I am not convinced it is one, at least if we define confounders in line with Pearl and Cinelli's tradition (see Confounding variables in experimental study), namely "variables that affect both the treatment and the outcome". If we relax the definition and define confounders as those variables that correlate with treatment and outcome, instead, I acknowledge the unmeasured variable I was referring to can be considered a confounder.

I obviously know that – if this really is a confounder – my estimated effects turn out to be biased. Yet, among the other variables I control for, there are a couple that I expect would be highly correlated with this unmeasured variable. May I say that this somehow significantly reduces the issue of excluding this infamous unmeasured variable?

Also, to give a little more context, my outcome maps whether a person recovered from a given condition (this is just an example of my actual problem, so please do not focus on the theoretically relevant aspects of the health problem) and I know that the condition itself is highly clustered in space (aka neighborhoods) and the type of neighborhood is the unmeasured variable I was referring to.

Given that there is somehow almost perfect overlap between the "antecedent" of the outcome (the health condition) and the unmeasured variable (the neighborhood), and given that I have other variables that are highly correlated with the unmeasured neighborhood information, and assuming that a confounder implies correlation and not causation, do you think the estimates of the model can be considered reliable, while certainly pointing out the limitation? How would you proceed?

Thanks for your help!

Best Answer

Although it is true that confounding is due to common causes of the treatment and outcome, a confounder does not have to cause both the treatment and the outcome. It needs to lie along an open backdoor path from the treatment to the outcome. See my answer here for a more precise definition of a confounder. Consider the following DAG (made at http://www.dagitty.net/):

and in particular the chain $$ A \leftarrow X_1 \leftarrow X_2 \rightarrow X_3 \rightarrow Y $$

where $A$ is the treatment and $Y$ is the outcome.

It's true that confounding is due to $X_2$, but adjusting for any one of $X_1$, $X_2$, or $X_3$ alone would be enough to remove the confounding, which means all three of them are confounders. $X_1$ causes the treatment but is only correlated with (and does not cause) the outcome. $X_3$ causes the outcome but is only correlated with (and does not cause) the treatment. Omitting all three of these would allow confounding to remain.

Consider also the additional causal path, $X_4 \rightarrow X_2$, which is in addition to the chain mentioned above. $X_4$ appears to satisfy many of the supposed qualities of a confounder: it is a cause of the treatment (and correlated with treatment) and it is a cause of the outcome (and correlated with the outcome). And yet it is not a confounder because adjusting for it does not remove confounding (i.e., it is not a part of any minimally sufficient adjustment set).

All this is to say is that knowing whether a variable causes the treatment and/or outcome or is just correlated with the treatment and/or outcome is not enough to decide whether something is a confounder and whether it needs to be adjusted for. $X_1$ and $X_3$ are confounders, even though $X_1$ doesn't cause the outcome and $X_3$ doesn't cause the treatment. $X_4$ is not a confounder, even though it causes both the treatment and the outcome. The only way to know whether failing to adjust for a given variable will induce bias due to confounding is to make a DAG and use the DAG adjustment rules to determine whether a sufficient adjustment set omits that variable.

Part 1: Regression

You decide to run a regression of the outcome on the treatment and confounders as a way to control for confounding by these variables because that is what linear regression is supposed to do. However, the effect estimate is only unbiased under extremely strict circumstances. First, that the treatment effect is constant across levels of the confounders, and second, that the linear model describes the conditional relationship between the outcome and the confounders. For the first, you might include an interaction between the treatment and each confounder, allowing for heterogeneous treatment effects while estimating the marginal effect. This is equivalent to g-computation (1), which involves using the fitted regression model to generate predicted values under treatment and control for all units and using the difference in the means of these predicted values as the effect estimate.

That still assumes a linear model for the outcomes under treatment and control. Okay, we'll use a flexible machine-learning method like random forests instead. Well, now we can't claim our estimator is unbiased, only possibly consistent, and it still requires the specific machine learning model to approach the truth at a certain rate. Okay, we'll use Superlearner (2), a stacking method that takes on the rate of convergence of the fastest of its included models. Well, now we don't have a way to conduct inference, and the model might still be wrong. Okay, we'll use a semiparametric efficient doubly-robust estimator like augmented inverse probability weighting (AIPW) (3) or targeted minimum loss-based estimation (TMLE) (4). Well, that's only consistent if the true models fall in the Donsker class of models. Okay, we'll use cross-fitting with AIPW or TMLE to relax that requirement (5).

Great. You've taken regression to its extreme, relaxing as many assumptions as possible and landing with a multiply-robust estimator (multiply-robust in the sense that if one of many models are correct, the estimator is consistent) with generally good inference properties (but it can be bootstrapped so getting the variance exactly right isn't a big problem). Have we solved causal inference?

You submit the results of your cross-fit TMLE estimate using Superlearner for the propensity score and potential outcome models with a full library including highly adaptive lasso and many other models, which, under weak assumptions, are all that are required for a truly consistent estimator that converges at a parametric rate.

A reviewer reads the paper and says, "I don't believe the results of this model."

"Why not?" you say. "I used the optimal estimator with the best properties; it is consistent and semiparametric efficient with few, if any, assumptions on the functional forms of the models."

"Your estimator is consistent," says the reviewer, "but not unbiased. That means I can only trust its results in general and as N goes to infinity. How do I know you have successfully eliminated bias in the effect estimate in this dataset?"

"..."

Part 2: Matching to the Rescue

You read about a hot new method called "propensity score matching" (6). It was big in 1983, and, even in 2021, you see it in almost every paper published in specialized medical journals. You come across King and Nielsen's influential paper "Why Propensity Scores Should Not Be Used for Matching" (7) and Noah's answer on CV describing the many drawbacks to using propensity score matching. Okay, you'll use genetic matching instead (8), and minimize the energy distance between the samples (9), including a flexibly estimated propensity score as a covariate to match on. You find that balance can be improved by using substantive knowledge to incorporate exact matching and caliper constraints that prioritize balance on covariates known to be important to the outcome. You decide to use full matching to relax the requirement of 1:1 matching to include more units in the analysis (10).

You estimate the treatment effect using a simple linear regression of the outcome on the treatment and the covariates, including the matching weights in the regression and using a cluster-robust standard error to account for pair membership (11). You resubmit the result of your full matching analysis using exact matching and calipers for prognostically important variables and a distance matrix estimating using genetic matching on the covariates and a flexibly estimated propensity score.

The reviewer reads your new manuscript. "Wow, you've learned a lot. But I still don't believe you've removed bias the in the effect estimate."

"Look at the balance tables," you say. "The covariate distributions are almost identical."

"I see low standardized mean differences," says the reviewer, "but imbalances could remain on other features of the covariate distribution."

"Look at the balance tables in the appendix which contain balance statistics for pairwise interactions, polynomials up to the 5th power of each covariate, and Kolmogorov-Smirnov statistics to compare the full covariate distributions. There are no meaningful differences between the samples, and no differences at all on the most highly prognostic covariates because of the exact matching constraints and calipers."

"I see..."

"Also, I used Branson's randomization test (12) with the energy distance as the balance statistic to show that my sample is better balanced not only than a hypothetical randomized trial using the same data, but also a block randomized trial, and even a covariate balance-constrained randomized trial."

"Wow, I guess I don't have much to say..."

"My outcome regression estimator isn't just consistent, it's truly unbiased in this sample. Also, because I incorporated pair membership into the analysis, my standard errors are smaller and more accurate and the resulting estimate is less sensitive to unobserved confounding* (13)."

"I get it!"

Part 3: The criticism

Frank Harrell bursts into the room. "Wait, by discarding so many units in matching, you have thrown away so much useful data and needlessly decimated your precision." Mark van der Laan follows. "Wait, by using substantive 'expertise' you are not letting the analysis method find the true patterns in the data that might have eluded researchers, and your estimator does not converge at a known rate, let alone a parametric one! And there is no guarantee that your inference is valid!" I, your humble narrator, too, join in on the dogpile. "Wait, by using exact matching constraints and calipers, you have shifted your estimand away from the ATE or any a priori describable estimand (14)! Your effect estimate may be unbiased, but unbiased for what?"

You stand there, bewildered, defeated, feeling like you have come nowhere since you asked your simple question on CrossValidated what felt like years ago, no closer to understanding whether you should use matching or regression to estimate causal effects.

The curtains close.

Part 4: Epilogue

In the face of uncertainty and scarcity, we are left with tradeoffs. The choice between a regression-based method and matching to estimate a causal effect depends on how you and your audience choose to manage those tradeoffs and prioritize the advantages and drawbacks of each method.

Standard regression requires strong functional form assumptions, but with advanced methods, those can be relaxed, at the cost of giving up on bias and focusing on consistency and asymptotic inference. Many of these advanced methods work best in large samples, and they still require many choices along the way (e.g., which specific estimator to use, which machine learning methods to include in the Superlearner library, how many folds to use for cross-validation and cross-fitting, etc.). Although the multiply-robust methods may guarantee consistency and fast convergence rates in general data, it is not immediately clear how you can assess how well they eliminated bias in your dataset, potentially leaving one skeptical of their actual performance in your one instance.

Matching methods require few functional form assumptions because no models are required (e.g., when using a distance matrix that doesn't depend solely on the propensity score, like that resulting from genetic matching). You can control confounding by adjusting the specification of the match, focusing more effort on hard-to-balance or prognostically important variables. You can come close to guaranteeing unbiasedness by ensuring you have achieved covariate balance, which can and should be measured extremely broadly with a skeptic in mind. You can use tools for analyzing randomized trials and trials with more powerful and robust designs. This comes at the cost of possibly decimating your precision by discarding huge amounts of data, changing your estimand so that your effect estimate doesn't generalize to a meaningful population and isn't replicable, and relying on ad hoc, "artisanal" methods with no clear path for valid inference.

The advantage matching has over regression, and the reason why I think it is so valuable and why I devoted my graduate training to understanding and improving matching and its use by applied researchers as the author of the R package cobalt, WeightIt, MatchIt, and others, is an epistemic advantage. With matching, you can more effectively convince a reader that what you have done is trustworthy and that you have accounted for all possible objections to the observed result, and can at least point to specific assumptions and explain how their violation might affect results. This all centers on covariate balance, the similarity between covariate distributions across the treatment groups. By reporting balance broadly and submitting the resulting matched data to a battery of tests and balance measures, you can convince yourself and your readers that the resulting effect estimate is unbiased and therefore trustworthy (given the assumptions mentioned at the beginning, though these may be tenuous, and neither matching nor regression can solve that problem).

However, not everyone agrees that this advantage so important, or more important than consistency and valid asymptotic inference. There can never be consensus on this matter, because consensus requires knowing the truth, and science (including statistics research) is about searching for an inherently unknowable truth (i.e., the true parameters that govern or describe our world). That is, if we knew the true causal effect, we could know the best method to estimate it, but we don't, so we can't. We can only do our best using the knowledge we have and try to manage the inherent constraints and tradeoffs as well as we can as we fumble around in the dark using the pinpoint of light the universe has shown us.

*Only when using a special method of inference for matched samples.

Snowden JM, Rose S, Mortimer KM. Implementation of G-Computation on a Simulated Data Set: Demonstration of a Causal Inference Technique. Am J Epidemiol. 2011;173(7):731–738.
van der Laan MJ, Polley EC, Hubbard AE. Super Learner. Statistical Applications in Genetics and Molecular Biology [electronic article]. 2007;6(1). (https://www.degruyter.com/view/j/sagmb.2007.6.issue-1/sagmb.2007.6.1.1309/sagmb.2007.6.1.1309.xml). (Accessed October 8, 2019)
Daniel RM. Double Robustness. In: Wiley StatsRef: Statistics Reference Online. American Cancer Society; 2018 (Accessed November 9, 2018):1–14.(http://onlinelibrary.wiley.com/doi/abs/10.1002/9781118445112.stat08068). (Accessed November 9, 2018)
Gruber S, van der Laan MJ. Targeted Maximum Likelihood Estimation: A Gentle Introduction. 2009;17.
Zivich PN, Breskin A. Machine Learning for Causal Inference: On the Use of Cross-fit Estimators. Epidemiology. 2021;32(3):393–401.
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55.
King G, Nielsen R. Why Propensity Scores Should Not Be Used for Matching. Polit. Anal. 2019;1–20.
Diamond A, Sekhon JS. Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics. 2013;95(3):932–945.
Huling JD, Mak S. Energy Balancing of Covariate Distributions. arXiv:2004.13962 [stat] [electronic article]. 2020;(http://arxiv.org/abs/2004.13962). (Accessed December 22, 2020)
Stuart EA, Green KM. Using full matching to estimate causal effects in nonexperimental studies: Examining the relationship between adolescent marijuana use and adult outcomes. Developmental Psychology. 2008;44(2):395–406.
Abadie A, Spiess J. Robust Post-Matching Inference. Journal of the American Statistical Association. 2020;0(ja):1–37.
Branson Z. Randomization Tests to Assess Covariate Balance When Designing and Analyzing Matched Datasets. Observational Studies. 2021;7:44–80.
Zubizarreta JR, Paredes RD, Rosenbaum PR. Matching for balance, pairing for heterogeneity in an observational study of the effectiveness of for-profit and not-for-profit high schools in Chile. The Annals of Applied Statistics. 2014;8(1):204–231.
Greifer N, Stuart EA. Choosing the Estimand When Matching or Weighting in Observational Studies. arXiv:2106.10577 [stat] [electronic article]. 2021;(http://arxiv.org/abs/2106.10577). (Accessed September 17, 2021)

Propensity-Score – How to Account for Moderator Variables in Propensity Score Matching or Exact Matching?

In order to perform moderation, you need to be able to validly estimate subgroups effects, which means confounding needs to be removed within subgroups of the moderating variable. In the context of matching, this means you must exactly match on the moderator, or equivalently, match within subgroups of the moderator (i.e., performing a separate matching routine within each subgroup).

To estimate the treatment effect, you can fit a model that include an interaction between the moderator and all other variables in the model (including the treatment and any treatment-by-covariate interactions), then perform a marginal effects procedure within subgroups. To assess whether moderation is present, you can test whether the subgroup treatment effects differ from each other.

Some useful resources on moderation analysis: Green and Stuart (2014), Griffin et al. (2022), the MatchIt vignette section on moderation analysis.

Best Answer

Related Solutions

Multiple Regression – Why Perform Matching for Causal Inference Instead of Regressing on Confounders

Part 1: Regression

Part 2: Matching to the Rescue

Part 3: The criticism

Part 4: Epilogue

Propensity-Score – How to Account for Moderator Variables in Propensity Score Matching or Exact Matching?

Related Question