High Collinearity – How to Infer Causality with High Collinearity?

causalityconfoundingmulticollinearitypropensity-scores

I recently started to ask myself how to measure the impact of education on indexes like GDP: what is the outcome of mathematics or computer science on GDP, at the country level for instance. In this example, it is not possible to modify the experiment.

I am aware of methods like Granger causality for time series, but more interested in knowing if we can gain insight directly from the couples (education, GDP) of all countries. Is it possible to use propensity scores when the variables are highly correlated? Can we reduce the collinearity by adding other variables to the model?

Best Answer

You can't estimate the effect until you know the causal structure: $Edu \to GDP$ or $Edu \leftarrow GDP$, $Edu \leftrightarrow GDP$, $Edu_{\leftarrow}^{\to} GDP$, $Edu_{\to}^{\leftrightarrow} GDP$, etc.

Learning the causal graph between two variables is an active area of research. There are currently two main classes of techniques available:

Methods based on non-Gaussianity of the error terms (e.g. LiNGAM) and
Methods based on non-linear relationships between the variables (e.g. Additive Noise Models).

However, these two classes of methods generally assume that there are no unobserved confounders and no feedback cycles. People are working on allowing unobserved confounders (see, e.g., (1), (2)). Furthermore, both methods require IID data and a pretty large sample size.

You might want to look at the solutions used by the winners of the 2013 Cause-Effect Pairs Challenge (solutions were presented at the 2013 NIPS workshop on causation, and the slides and code are all up on the website).

Adding more variables to the model could help a lot, because then the conditional independence relations become informative, and allow the use of causal inference algorithms that can handle cycles (e.g. CCD). Frederick Eberhardt is working on a SAT-solver based method that can handle cycles AND unobserved confounders (it's still in development).

Best Answer

Related Solutions

Solved – From a statistical perspective, can one infer causality using propensity scores with an observational study

Solved – Applying Generalized Linear Model to a data with high collinearity

Related Question