Solved – How to you have significant correlations and insignificant coefficients

correlationmultiple regressionregression coefficients

I'm a psychology graduate, so I admit that statistics do not come naturally to me. However, I find them fascinating nonetheless.

At the moment i'm struggling with regressions, or specifically in this instance a multiple linear regression.

I was just curious about the relationship between correlations and coefficients. In my assignment we've been asked to look at some data pertaining to stress (all made up, of course!) and the results indicate that several of the factors correlate to one another (e.g. assignment anxiety and social coping skills),as would be expected. However, only a single coefficient is significant in relation to the outcome variable (stress).

I'm sure this is probably a ludicrously easy thing to understand but as no one has explained it to me I feel lost about how these two things (correlations and coefficients) are related (if at all they are).

Best Answer

Here are some results from a regression for 74 cars of gpm (gallons per mile) as a function of trunk, weight, length and displacement, which are all measures of size of cars. Only one predictor achieves significance at conventional levels, although its P-value is pleasingly small.

. regress gpm trunk weight length displacement

      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  4,    69) =   48.19
       Model |  .008805719     4   .00220143           Prob > F      =  0.0000
    Residual |  .003151908    69   .00004568           R-squared     =  0.7364
-------------+------------------------------           Adj R-squared =  0.7211
       Total |  .011957628    73  .000163803           Root MSE      =  .00676

------------------------------------------------------------------------------
         gpm |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       trunk |   .0003037   .0002702     1.12   0.265    -.0002354    .0008427
      weight |   .0000121   3.90e-06     3.11   0.003     4.35e-06    .0000199
      length |   .0000137   .0001189     0.12   0.909    -.0002235    .0002509
displacement |   4.31e-06   .0000194     0.22   0.825    -.0000344     .000043
       _cons |   .0059957    .012773     0.47   0.640    -.0194857    .0314771
------------------------------------------------------------------------------

Stata users will, or rather should, recognise regression output for the auto dataset. Naturally none of the commentary here is intrinsic or specific to Stata.

If we look at correlations for the predictors with gpm, here presented in terms of correlations and 95% confidence intervals, we see that all correlations between individual predictors and gpm are significant at the 5% level; in fact stronger statements could be made.

                            correlations and 95% limits
trunk        gpm               0.632    0.472    0.752
weight       gpm               0.854    0.778    0.906
length       gpm               0.820    0.727    0.883
displacement gpm               0.771    0.659    0.850

It is easy to reconcile these two findings. The correlations pay absolutely no attention to any other variables except the two named. (There are ways of taking other variables into account, notably partial correlation, but we haven't done that.) The regression on the other hand is a team effort and each coefficient depends not only on the associated predictor, but also on the other predictors. The way it shapes out here is that the predictors are strongly correlated with each other, but weight looks like the best predictor, and given that weight is in the equation, the other predictors cannot add much.

In a real problem, you should always look at the entire correlation matrix to check the relationships among the predictors; the corresponding scatter plot matrix; and various diagnostic plots.

Only when the predictors are uncorrelated with each other will the effects of all the predictors be the sum of the effects of individual predictors. If you have that situation, it is often bad news, not good, as it means your data are just noise. Absent some experimental design intended to secure independence, moderate if not strong relationships among the predictors are as much to be expected as moderate to strong relationships between the predictors and the response variable.