Solved – “Wrong Sign” On Regression Coefficients – Hierarchical Multiple Linear Regression

data transformationlogisticmulticollinearitymultiple regressionregression coefficients

I am analyzing my data on the relationship between spirituality and negative emotional states (depression, anxiety, and stress) using a hierarchical multiple linear regression. Everything seemed to be "okay" with the results until I checked the signs of the regression coefficients!! The results are showing a positive relationship between depression and a spirituality factor which includes having a reason for living. Virtually all of the research in this area shows that there should be a negative relationship between these two factors, i.e. that higher levels of depression are correlated with lower levels of having a reason for living (without any interaction terms or moderating variables).

Here are the steps that I have completed so far in this three-part data analysis (1st data analysis – Principal components analysis / 2nd Part – EFA (Exploratory Factor Analysis) / 2nd data analysis – hierarchical multiple linear regression):

  1. Created spirituality questionnaire and collected data from sample (n = 189).

  2. Used a principal components analysis to identify how many factors to extract in EFA.

  3. Derived four factors from EFA.

  4. Factors included log transformed questions which changed the directionality of the questionnaire (i.e. questions which were "positively" worded became negatively worded).

  5. All questions on questionnaire were standardized (transformed to z-scores) to correct for directionality problem.

  6. Negative emotional states were examined for normality. Two of the three negative emotional states had to be log transformed.

  7. Ran 3 hierarchical multiple linear regressions with DV = Either log depression, log anxiety, or stress and IVs = Age (first "block"), four spirituality factors (second "block") using enter method.

  8. Noticed that signs were incorrect, so I compared log transformed versions of DVs to non-transformed versions AND standardized DV (z-score) versions (see this article: The Independent Sign Bias: Gaining Insight from Multiple Linear Regression [PDF] –> pg. 1)

  9. IV having to do with meaning in life had correct sign in non-transformed version of depression HOWEVER it looks like there is a collinearity issue between two of the spirituality IVs with the non-transformed version which may affect the signs!!! (Condition index > 15 and Variance proportions: .81 and .40)

  10. Looked at bivariate correlations to see if the signs were correct, and oddly enough the signs were accurate (this was only true for the non-transformed versions of the hierarchical multiple linear regressions). None of the correlations were above .6.

  11. Removed Age as an IV to see if that made a difference. It did not.

  12. Removed spirituality IVs one at a time with stepwise method rather than enter method and this only increased the amount of collinearity between the variables that remained in the final model.

I would greatly appreciate any suggestions on how to correct for the wrong signs and/or if I did anything incorrectly in my three-part data analysis.

Best Answer

There are several things in your description that are a bit confusing, for example you state that taking the log transform reverses the direction of the coding, but the log by itself does not reverse coding.

Your main question seems to be that when you look at individual pairwise correlations the sign of the correlation is as expected, but some of the signs of the slopes in a multiple regression are opposite what you expect. This is not uncommon since the interpretation of slopes is much more complex in multiple regression models.

Consider this example (I read this recently, I don't get the credit for thinking of it): Collect data on the change in various peoples pockets, the variables to collect are the total value of the change (y), the total number of coins (x1), and the total number of coins that are not quarters, or if using non-US coins then number of coins not the highest common coin carried (x2). Generally x1 and x2 will both be positivly correlated with y, but if you do a multiple regression using both x1 and x2 then the slope on x2 will be negative because to increase the number of non-quarters without changing the total number of coins we need to trade quarters for other coins of lesser value which decreases y. You could have something similar happening with your data, does it really make sense to increase the religeous variable without the others changing? What is often more meaningful is to compare predicted outcomes for what would be considered common combinations of your predictor variables.