Joint Hypothesis Testing on Multicollinear Regressors

feature selectionlinearmulticollinearityregression

I was reading the following thread: link when I came across this discussion regarding dropping regressors that demonstrate multicollinearity from a linear regression model:

But what if you have multicollinearity and removing a variable reduces
it? (This isn't the case in the original question, but often is in
other data). Isn't the resulting model often superior in all sorts of
ways (reduce variance of estimators, signs of coefficients more likely
to reflect underlying theory, etc)? If you still use the correct
(original model) degrees of freedom. – Peter Ellis Feb 13 '12 at 23:08

It is still better to include both variables. The only price you pay
is the increased standard error in estimating one of the variable's
effects adjusted for the other one. Joint tests of the two collinear
variables are very powerful as then they combine forces rather than
compete against one another.
Also if you want to delete a variable,
the data are incapable of telling you which one to delete. – Frank
Harrell

I am curious about the part in bold. What kind of statistical test becomes more powerful when using two or more multi-collinear variables? The textbooks I've been using to study linear regression seem to agree that we should seek to remove it in almost all cases. Thank you!

Best Answer

When you have a predictive multicollinearity, you want to keep the variables that are involved. For example, consider predicting species sex (M or F) using length, height and width. In some species, it is the shape more than overall size that predicts species sex. The predictor variables are highly collinear (larger animals are larger all around), but all variables are needed to characterize the animal's shape. In this case, depending on sample sizes, it is possible that the individual predictors are insignificant (because of the multicollinearity), but a joint test for them will be highly significant (because they are all jointly highly predictive of the outcome).

Related Question