# Causal Inference – Adjusting Confounding in IPW/MSM with Time-Varying Treatment

causalityconfoundingpanel data

Problem

I am playing around with what I thought would be a simple, straightforward simulation of recovering a lack of treatment effect under confounding in a longitudinal setting. I am trying to recover this null effect using inverse probability weighting (IPW) and a marginal structural model (MSM), as described in Chapter 21.2 of Hernán and Robins (2020). However, I don't seem to be able to get an unbiased estimate and I am at a loss of why.

Setup

In the most basic form of my simulations, there are three types of variables at each time point t: confounder L, treatment A, and outcome Y. The following DAG represents the setup for two timesteps $$t \in \{0, 1\}$$ with potentially further steps following:

Data from this DAG for a slightly longer period of $$t \in \{0,…,5\}$$ can be simulate for example by using the R package simcausal:

library(tidyverse)
library(glue)
library(simcausal)

set.seed(42)
D <- DAG.empty() +
node("L", t=0:5, distr="rnorm", mean = {if (t == 0) {0} else {0.5 * L[t-1]}}, sd = 0.5) +
node("A", t=0:5, distr="rbern", prob = plogis(-1 + L[t] * 1.06)) +
node("Y", t=0:5, distr="rnorm", mean = 10 + 1.5 * L[t], sd = 1)

D <- set.DAG(D)
#> ...automatically assigning order attribute to some nodes...
#> node L_0, order:1
#> node A_0, order:2
#> node Y_0, order:3
#> node L_1, order:4
#> ...
#> node Y_4, order:15
#> node L_5, order:16
#> node A_5, order:17
#> node Y_5, order:18
dat <- as_tibble(sim(D,n=1000))
#> simulating observed dataset from the DAG object
dat
#> # A tibble: 1,000 x 19
#>       ID      L_0   A_0   Y_0     L_1   A_1   Y_1      L_2   A_2   Y_2     L_3
#>    <int>    <dbl> <int> <dbl>   <dbl> <int> <dbl>    <dbl> <int> <dbl>   <dbl>
#>  1     1 -0.0203      0 10.0  -0.237      0  8.24 -0.953       0  7.52 -0.0643
#>  2     2 -0.776       0  7.28  0.0400     0 10.9  -0.0523      0 12.8   0.318
#>  3     3  0.584       1 10.7   0.175      1 11.0  -0.455       1  9.78  0.440
#>  4     4 -0.137       0  9.38 -0.558      0  8.26 -0.902       0  9.04 -0.855
#>  5     5 -0.234       0 10.1   0.252      1  9.38  0.236       0  8.81  0.0685
#>  6     6 -0.619       0  8.54  0.911      1 11.7   0.0840      0 10.8  -0.357
#>  7     7 -0.00388     1 10.5  -0.498      0  8.17  0.00416     0 10.9  -0.857
#>  8     8 -0.400       1  9.94 -0.420      0  9.54  0.0131      0 10.0  -0.149
#>  9     9 -0.267       1  9.32 -0.718      0  9.56 -0.174       0  7.95  0.154
#> 10    10  0.644       1 10.9   1.12       0 11.4   0.625       0 12.1   0.512
#> # ... with 990 more rows, and 8 more variables: A_3 <int>, Y_3 <dbl>,
#> #   L_4 <dbl>, A_4 <int>, Y_4 <dbl>, L_5 <dbl>, A_5 <int>, Y_5 <dbl>


Single time step

From this data, I am able to recover the true null effect for a single time step. Limiting the analysis to a single time, we know from the DAG that $$Y^{a} \perp A | L$$. Therefore, conditioning on $$L$$ should be sufficient to control for the confounding. Following the description in chapter 12.4 of Hernán and Robins (2020), I can use logistic regression to obtain nonstabilised IPWs $$W^{A} = 1 / f(A|L)$$ and fit the MSM $$\mathbb{E}[Y^{a}]=\beta_0 + \beta_1a$$ using a linear regression weighted by $$W^{A}$$. Doing this for $$t=5$$ gives the correct answer:

ps_mod <- glm(A_5 ~ L_5, data = dat, family = binomial)
ps <- predict(ps_mod, type = "response")
ipw <- 1 / if_else(dat\$A_5 == 0, 1-ps, ps)

lm(Y_5 ~ A_5, data = dat) %>% summary()
#>
#> Call:
#> lm(formula = Y_5 ~ A_5, data = dat)
#>
#> Residuals:
#>     Min      1Q  Median      3Q     Max
#> -3.8492 -0.8697 -0.0358  0.9217  3.8088
#>
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)  9.86092    0.04907 200.953  < 2e-16 ***
#> A_5          0.50499    0.09097   5.551 3.63e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.307 on 998 degrees of freedom
#> Multiple R-squared:  0.02996,    Adjusted R-squared:  0.02898
#> F-statistic: 30.82 on 1 and 998 DF,  p-value: 3.631e-08

lm(Y_5 ~ A_5, data = dat, weights = ipw) %>% summary()
#>
#> Call:
#> lm(formula = Y_5 ~ A_5, data = dat, weights = ipw)
#>
#> Weighted Residuals:
#>     Min      1Q  Median      3Q     Max
#> -6.7440 -1.1592 -0.0744  1.1355  5.2034
#>
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 10.01113    0.05736 174.522   <2e-16 ***
#> A_5          0.07162    0.08120   0.882    0.378
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.813 on 998 degrees of freedom
#> Multiple R-squared:  0.0007788,  Adjusted R-squared:  -0.0002224
#> F-statistic: 0.7779 on 1 and 998 DF,  p-value: 0.378


All time steps

Chapter 21.2 of Hernán and Robins (2020) explains how the above can be extended to multiple time steps. The weights for $$A_t$$ are now calculated for each step conditional on the treatment and covariate histories $$\bar A_{t-1}$$ and $$\bar L_{t-1}$$. The nonstabilised weights for the entire sequence $$\bar A$$ are then obtained through multiplication as:

$$W^{\bar A} = \prod_{t=0}^T \frac{1}{f(A_t|\bar A_{t-1}, \bar L_{t})}$$

In this example, $$f(A_t|\bar A_{t-1}, \bar L_{t-1}) = f(A_t|L_t)$$, which I estimate again using logistic regression. I should then be able to proceed to estimate the MSM $$\mathbb{E}[Y^{\bar a}]=\beta_0 + \beta_1cum(\bar a)$$ with appropriately weighted linear regression. Note that I hypothesised the treatment effect to depend on the cumulative sum of treatments received. Unfortunately, this strategy does not seem to work:


ipw <- rep(1, 1000)
for (i in 0:5) {
form <- as.formula(glue("A_{i} ~ L_{i}"))
ps_mod <- glm(form, data = dat, family = binomial)
ps <- predict(ps_mod, type = "response")
ipw <- ipw * 1 / if_else(dat[[glue("A_{i}")]] == 0, 1-ps, ps)
}

summary(ipw)
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
#>    2.409   12.118   27.875   68.280   63.604 4585.662

lm(Y_5 ~ I(A_5 + A_4 + A_3 + A_2 + A_1 + A_0), data = dat) %>% summary()
#>
#> Call:
#> lm(formula = Y_5 ~ I(A_5 + A_4 + A_3 + A_2 + A_1 + A_0), data = dat)
#>
#> Residuals:
#>     Min      1Q  Median      3Q     Max
#> -3.9975 -0.8653 -0.0425  0.8785  4.1043
#>
#> Coefficients:
#>                                      Estimate Std. Error t value Pr(>|t|)
#> (Intercept)                           9.66836    0.07233 133.674  < 2e-16 ***
#> I(A_5 + A_4 + A_3 + A_2 + A_1 + A_0)  0.20102    0.03516   5.717 1.43e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.305 on 998 degrees of freedom
#> Multiple R-squared:  0.03171,    Adjusted R-squared:  0.03074
#> F-statistic: 32.68 on 1 and 998 DF,  p-value: 1.434e-08

lm(Y_5 ~ I(A_5 + A_4 + A_3 + A_2 + A_1 + A_0), data = dat, weights = ipw) %>% summary()
#>
#> Call:
#> lm(formula = Y_5 ~ I(A_5 + A_4 + A_3 + A_2 + A_1 + A_0), data = dat,
#>     weights = ipw)
#>
#> Weighted Residuals:
#>     Min      1Q  Median      3Q     Max
#> -62.204  -4.413  -0.365   4.310  81.938
#>
#> Coefficients:
#>                                      Estimate Std. Error t value Pr(>|t|)
#> (Intercept)                           9.64302    0.11080  87.029  < 2e-16 ***
#> I(A_5 + A_4 + A_3 + A_2 + A_1 + A_0)  0.23199    0.03438   6.748 2.54e-11 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 10.15 on 998 degrees of freedom
#> Multiple R-squared:  0.04363,    Adjusted R-squared:  0.04267
#> F-statistic: 45.53 on 1 and 998 DF,  p-value: 2.544e-11


Question

I would be extremely grateful for any pointers as to why the confounded effect persist in the last model above. I've been wrecking my brain for the last 3 days but I can't figure out where I am going/thinking wrong!

I've tried various simulations (finally ending up at the simple version presented here). I've also calculated stabilisied IPWs or included each treatment individually in the MSM (to the same effect). I've also calculated IPWs using the functions in the ipw package, always ending up with the essentially same result.

Another thought is that the PS model at each time point should include all past covariates and treatment as predictors, not just the covariates at the immediately preceding time point. That is, the model for A_1 should be A_1 ~ L_1 + A_0 + L_0 + Y_0. I agree it shouldn't be necessary given the DAG but was thinking there could be something I missed, and it would otherwise be standard practice.