Solved – When 2 variables are highly correlated can one be significant and the other not in a regression

generalized linear modelglmmmulticollinearitymultiple regressionregression

In regression, when 2 parameters are correlated and added to a model separately, how likely is it that one parameter will be a significant predictor of the response variable while the other is not? To me this seems unlikely but I've encountered it in a publication. Based on my understanding and experience of multi-collinearity, if one variable is highly correlated with another, both should be influential in a model. Under what conditions could this occur?

Context:
For pedagogical purposes I am replicating an GLMM analysis from a 2004 Science paper that used observational data which was posted with the paper. They are interested in causal interpretations of 2 parameters that are correlated (Pearson's=0.67, p<0.01). They state that the 1st parameter is significant in their regression (beta=0.24,SE=0.05,p<0.01) while the other is not (b=0.02,SE=0.07,p=0.51). The language of the paper implies that they ran two separate models, one with each predictor on its own (but see edit below). A model with just the 1st predictor also has a lower AIC than a model with just the 2nd predictor (335.9 vs 341.0).

I can replicate the much of the analysis, including the correlation, the coefficients of the 1st parameter, and qualitatively the difference in AIC.

Beyond not being able to replicate their analysis (perhaps I am missing some modeling detail), I don't understand how they could have two correlated predictors yet end up with only one being significant in the model.

**EDIT:**The paper implies that they are reporting coefficients for two separate models that had only one or the other predictor. However, the coefficients that they report possibly came from a model that had both predictors in at the same time. I can now replicate the beta and SE for the 2nd parameter. They appear to have ignored issues of multi-colliearity in running and interpreting their models.

I am still wondering if my original intuition about their being a problem with the reported values is generally correct, or are their conditions where two correlated variables will behave differently in a fitted model if entered in separately.

Best Answer

The effect of two predictors being correlated is to increase the uncertainty of each's contribution to the effect. For example, say that $Y$ increases with $X_1$, but $X_1$ and $X_2$ are correlated. Does $Y$ only appear to increase with $X_1$ because $Y$ actually increases with $X_2$ and $X_1$ correlated with $X_2$ (and vice versa)? The difficulty in teasing these apart is reflected in the width of the standard errors of your predictors. (The SE is a measure of the uncertainty of your estimate.)

We can determine how much wider the variance of your predictors' sampling distributions are as a result of the correlation by using the Variance Inflation Factor (VIF). For two variables, you just square their correlation, then compute:
$$ VIF = \frac{1}{1-r^2} $$ In your case the VIF is $2.23$, meaning that the SEs are $1.5$ times as wide. It is possible that this will make only one still significant, neither, or even that both are still significant, depending on how far the point estimate is from the null value and how wide the SE would have been without any correlation.

To get a stronger sense of how the individual contributions of correlated variables can appear different when both are included vs. not, it may help to read my answer here: Is there a difference between 'controlling for' and 'ignoring' other variables in multiple regression?

To see a demonstration showing that whether the variables are significant can vary, you may want to check out my answer here: How seriously should I consider the effects of multicollinearity in my regression model?

Related Question