Solved – Interpretation of regression coefficients in the presence of modest correlations

multicollinearityregression

I have a multiple regression model where I have nearly 20 independent variables. These variables are modestly correlated with each other (e.g., the maximum VIF is around 4 with most of them in the 2s).

One of the coefficients is statistically significant and is negative when I expected that it would be positive. I know that 'wrong signs' can be because of several reasons such as multi-collinearity, missing data, omitted variables etc but I am wondering if there is a simpler explanation for the 'wrong sign'.

The usual interpretation of the coefficients is that it represents the impact on the dependent variable when we change the independent variable by 1 unit holding everything else constant.

However, the above interpretation is accurate only if the independent variables are completely uncorrelated with one another. In the presence of modest correlations among the independent variables, when we increase one of them by 1 unit the others are also bound to go up/down by a modest amount (depending on the sign of the correlation) and hence the only way to predict the impact of a unit change of an independent variable is to evaluate its impact on the other independent variables and then assess the overall impact on the dependent variable. When we do such an analysis we may well discover that the 'wrong sign' is a non-issue as increasing that variable by 1 unit may result in an increase in the dependent variable via the changes in the other independent variables in the model.

Does the above explanation make sense or am I missing something?

Best Answer

This is not an answer, but it is too long for a comment.

I would say the interpretation is accurate even with multicollinearity, but the ceteris paribus coefficient is not the quantity you care about. If you believe that the multicollinearity arises from an approximate linear relationship among some of the regressors, that relationship could be formalized either through some constraint on the parameters (such as dropping a variable or something more) or with a simultaneous equation approach. Without more details about the nature of your problem, it's hard to be more specific. There are some examples (28, 29 and 5) in Peter Kennedy's paper Oh No! I Got the Wrong Sign! What Should I Do?.

Related Question