Why does collider adjustment in a shielded triplet tend to cause independence

causalitydagmutual informationrstructural-equation-modeling

I created a causal model in which $X$ causes $Y$ and $Z$, and $Y$ causes $Z$ in the following way:

set.seed(2021)
N <- 10000

X <- purrr::rbernoulli(N)
Y <- X + purrr::rbernoulli(N)
Z <- 2*X + 3*Y + purrr::rbernoulli(N)

I created it this way so that the variables are discrete. That's it. The equivalent DAG would be the one below:

enter image description here

People working with causal inference are probably more used to unshielded triplets, in which case we would have no edge between $X$ and $Y$ and therefore a v-structure. In this hypothetical situation, $X$ and $Y$ are independent but become [spuriously] dependent when conditioning on $Z$, a collider.

However, going back to the diagram I showed, there is direct association between $X$ and $Y$ and if we adjust on $Z$, we open a blocked path, in the sense that we add some spurious dependence between $X$ and $Y$ through $Z$. What's driving me nuts is that the mutual information between $X$ and $Y$ not only is larger than $X$ and $Y$ conditioned by $Z$, but the latter is $0$! That is,

$I(X;Y) > I(X;Y|Z) = 0$. In R:

infotheo::condinformation(X,Y)
infotheo::condinformation(X,Y,Z)

I tried changing the equations for $Y$ and $Z$ and yet, the zero is always there (almost always, check the end of the question) for $I(X;Y|Z)$, whenever it's a shielded triplet. Even if I do it with continuous variables and normal distributions for the noise, I still find the same thing.

set.seed(2021)
N <- 10000

X <- rnorm(N, mean=10, sd=2)
Y <- X + rnorm(N, mean=10, sd=2)
Z <- X + Y + rnorm(N, mean=10, sd=2)

miic::discretizeMutual(X,Y, plot=FALSE)$info
miic::discretizeMutual(X,Y, matrix_u=matrix(Z), plot=FALSE)$info

But then, if I change a bit the structural equation for $Z$, I get something different from zero.

set.seed(2021)
N <- 10000

X <- rnorm(N, mean=10, sd=2)
Y <- X + rnorm(N, mean=10, sd=2)
Z <- 2*X + 3*Y + rnorm(N, mean=10, sd=2)

miic::discretizeMutual(X,Y, plot=FALSE)$info
miic::discretizeMutual(X,Y, matrix_u=matrix(Z), plot=FALSE)$info

I also get a value different from zero if I make the distribution of the noise in $Z$ explicitly different from the noise in $X$ and $Y$.

set.seed(2021)
N <- 10000

X <- rnorm(N, mean=10, sd=2)
Y <- X + rnorm(N, mean=10, sd=2)
Z <- X + Y + rnorm(N, mean=100, sd=10)

miic::discretizeMutual(X,Y, plot=FALSE)$info
miic::discretizeMutual(X,Y, matrix_u=matrix(Z), plot=FALSE)$info

I don't understand what's happening here. I tried to draw a few diagrams, see what paths would be blocked or opened, what would happen with correlation between the noises, but I can't think of a way that adjusting for the collider $Z$ would make $X$ and $Y$ independent. A hypothesis would be some sort of cancelling of effects between the two paths, but I changed the equations in ways that I didn't expect it to happen and still… The $0$ is there.

Could you please explain to me what's happening here? Both analytically, if possible, and intuitively. By intuitively, I'm referring to the intuitive explanations for the unshielded triplets. Adjusting for $B$ in $A \rightarrow B \leftarrow C$ makes A and C dependent. Adjusting for $B$ in $A \rightarrow B \rightarrow C$ or $A \leftarrow B \rightarrow C$ make $A$ and $B$ independent. Something along these lines 🙂

Best Answer

Here you can find a formal example of linear causal model that share your DAG. Which OLS assumptions are colliders violating?

I consider (there and here) the case where all noises are independent each others. Moreover now I consider here the particular case where noises are standard Normal. Moreover, like in your example, I start considering all three causal parameters equal to $1$.

As showed in my example (in the link) the causal coefficient/effect of $X$ on $Y$ can be consistently estimated from the regression of $Y$ on $X$. No controls are needed, indeed if we add a collider ($Z$) as control the regression coefficient of $X$ is no more a consistent estimator of the causal coefficient of $X$ on $Y$; worse, it do not represent any causal parameter of SCM. It is a useless regression coefficient.

Actually under the just suggested parametrization, the useless regression coefficient converge precisely to $0$; from normality assumption we can said that $X$ and $Y$ are independent conditioned on $Z$.

Now you said that

I tried changing the equations for $Y$ and $Z$ and yet, the zero is always there (almost always, check the end of the question) for $I(X;Y|Z)$, whenever it's a shielded triplet. Even if I do it with continuous variables and normal distributions for the noise, I still find the same thing. ... But then, if I change a bit the structural equation for $Z$, I get something different from zero. ... A hypothesis would be some sort of cancelling of effects between the two paths, but I changed the equations in ways that I didn't expect it to happen and still... The 0 is there.

This do not seems me completely clear and true. Indeed from my example if the causal parameter in the structural equation for $Y$ is different from $1$ the useless regression coefficient become different from zero. For example if the causal parameter in that structural equation is $2$ the conditional independence claimed do not hold. Indeed the useless regression coefficient converrge to $0,5$

Moreover following the alternative parameterization that you suggest ($2$ and $3$ as causal parameters in structural equation for $Z$ and retain $1$ in the equation for $Y$) the useless regression coefficient converge to $-0,5$.

In general, different values of causal parameters can produce null or positive or negative value for the useless regression coefficient. Moreover even the distribution of structural errors matters.

Finally

Why does collider adjustment in a shielded triplet tend to cause independence?

the independence you claimed can happen in particular cases (parameters combinations) but in general it do not hold. Indeed the general message is that control for collider is a bad idea.

Related Question