Solved – In a multiple linear regression how to know when to drop variables

multicollinearitymultiple regressionregression coefficients

If I given a model with 3 variables($X_1, X_2$ and $ X_3$) and a correlation between them are not high. The highest correlation coefficient from the correlation matrix is equal to $0.699614004$ and is between $X_2$ and $X_3$. Is this coefficient high enough that to drop the variable $X_2$ in order for the model to be precise?

Generally how to know when to drop a variable form the model?

Best Answer

What you are really asking is: Should I worry about collinearity among the predictors in my model? Collinearity refers to a situation where two or more of the predictors in a regression model are moderately or highly correlated. (Collinearity is also referred to as multicollinearity.)

As other people have already pointed out, whether or not you need to worry about collinearity ultimately depends on the purpose of your model. Usually, if you are interested in using your model to make predictions, collinearity is not as worrisome as if you intend to use the model to learn something about the effect of the predictor variable engaged in collinearity on the outcome variable. The extent of collinearity also factors into whether or not you need to worry about collinearity.

This blog post provides a nice description of when you need to address collinearity and when you don't: http://statisticsbyjim.com/regression/multicollinearity-in-regression-analysis/.