I have run a multiple linear regression with 4 IVs. Three of the IVs are constructs and the fourth is gender. All IVs have statistically significant correlations with the DV. All three construct IVs have partial correlation coefficients that are negative. The gender "being female" has a positive correlation coefficient of .2 that is stat significant.
The r on the other hand is positive… 0.5.
How do I interpret this? How can the overall model predict a positive effect on the DV while all of the construct IVs have a statistically significant negative effect on the DV? Is the strength of "being female" enough to overcome all the other IV effects? Yet .2 is not that strong a correlation.
Or am I missing something simple?
Best Answer
There is such a thing as a multiple correlation coefficient, although almost no one seems to know it exists. In addition, it does not come standard with output from any statistical software, as far as I know. So I suspect that you are looking at $R^2$, as @StephanKolassa suggests. You can think of this as the proportion of the variance in your DV that your model helps to explain. Unless something has gone horribly awry (see, e.g., When is $R^2$ negative?), this value will always be positive (you can't explain less than zero of what's going on).
For what it's worth, the multiple correlation coefficient is the correlation between your model's predicted values, $\hat y$, and the actual values of your DV, $y$. Again, unless your model is badly misspecified, the procedure used to fit your model coefficients works in such a way that this will always be positive, even if the individual coefficients / correlations are negative. In other words, when a data point has a higher (lower) value, your model predicts a higher (lower) value, even if this occurs when $x$ is at a lower (higher) value.
Here is a simple demonstration using only one variable to make it easier to see. I don't know if you are familiar with the statistical software
R
, but hopefully the code is sufficiently self-explanatory.Below is what this looks like. Notice that you have lower
y
values associated with higherx
values, but that highery
values are associated with higherfitted y
values.