Solved – Two highly correlated variables where both correlate with a third: Correlation and Causation

causalitycorrelationgeneralized linear modelregression

My question is certainly quite basic for statisticians!

Let's suppose Var1 and Var2 are highly correlated with a poor $R^2$.

Var3 is another variable that we will use in a regression (standard linear model (Gaussian error distribution, OLS estimator)) as a response variable. Var1 and Var2 are the explanatory variables of this regression where the interaction effect is not computed. Let's call this regression regression1

Is it possible that Var1 affects Var3 (highly significantly) but not Var2 (not significant at all)?

Let's assume that the causation is the following: Var1 causes Var2 and Var3 Is it plausible that both Var2 and Var1 are associated with highly significant p-value in regression1?


For example:

In a given species of fish, there is a very significant relation between the size of the fishes and the color of the fishes. But the linear model that gives this very significant p-value does not explain much of the total variance in fish color and size.

When performing a regression of depth on fish size and fish color (without the interaction term), I get two highly significant p-values (for the two explanatory variables).

Can I infer that both the size of the fishes and the color of the fishes "influence separately" the depth at which the fishes are?

Or maybe depth has an effect on the color of the fishes (because the diet is different for example) and the color of the fishes influenced the size of the fishes (because fishes that are bright need be big to escape predators or something like that). In such case, the size of the fishes would be associated to a significant p-value (in the regression of depth) only because it is correlated to the other explanatory variable which is the color of the fishes.

Best Answer

The comment made by @user32164 still stands as I write: "highly correlated with a poor $R^2$" is contradictory. Regardless of what you consider as highly correlated, a high correlation means a high $R^2$.

I am assuming that you measured color somehow so that it may fairly be used as a quantitative predictor in a regression model. Whether that's so is an issue that people in your field might debate, but I'll take it as read.

We know what you mean, but language such as "very significant p-value" is a little loose. A low P-value indicates that an effect, difference, relationship, whatever is significant, but the P-value itself is an indicator of significance, not something that is itself significant.

Those small points aside, we need to distinguish different kinds of question here.

  1. Statistical and causal inference Focusing on your example, whether fish color causes depth at which fish are seen, or vice versa, or both, is a biological question on which statistical people have little to say. They might help you design an experiment to test the underlying hypotheses, but from the example as given the extent to regression can be used to infer causation (existence and/or direction of causal relationships) is very limited. There is an enormous literature on this, but I think there is consensus that predictive ability as shown by regression is not sufficient to infer causation.

  2. Significance and strength of relationship You appear to be confusing significance of relationship and strength of relationship at a basic level. With moderate and especially large sample size, it is perfectly possible to get significant results (at conventional levels) that are only weakly predictive. Usually, a significant result underlines that some quantity of interest is not zero, but that itself doesn't make it major or substantial scientifically or practically.

  3. Separate effects of predictors You can't infer that predictors have separate effects just from the evidence you cite. If you have some $x_1$ as predictor and then add $x_2$ as another, whether the coefficient of $x_1$ changes is one thing to look at. You should benefit from testing the interaction. You always benefit from thinking about what the underlying science indicates about possible relations between $x_1$ and $x_2$.

Related Question