Solved – How to you have significant correlations and insignificant coefficients

correlationmultiple regressionregression coefficients

I'm a psychology graduate, so I admit that statistics do not come naturally to me. However, I find them fascinating nonetheless.

At the moment i'm struggling with regressions, or specifically in this instance a multiple linear regression.

I was just curious about the relationship between correlations and coefficients. In my assignment we've been asked to look at some data pertaining to stress (all made up, of course!) and the results indicate that several of the factors correlate to one another (e.g. assignment anxiety and social coping skills),as would be expected. However, only a single coefficient is significant in relation to the outcome variable (stress).

I'm sure this is probably a ludicrously easy thing to understand but as no one has explained it to me I feel lost about how these two things (correlations and coefficients) are related (if at all they are).

Best Answer

Here are some results from a regression for 74 cars of gpm (gallons per mile) as a function of trunk, weight, length and displacement, which are all measures of size of cars. Only one predictor achieves significance at conventional levels, although its P-value is pleasingly small.

. regress gpm trunk weight length displacement

      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  4,    69) =   48.19
       Model |  .008805719     4   .00220143           Prob > F      =  0.0000
    Residual |  .003151908    69   .00004568           R-squared     =  0.7364
-------------+------------------------------           Adj R-squared =  0.7211
       Total |  .011957628    73  .000163803           Root MSE      =  .00676

------------------------------------------------------------------------------
         gpm |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       trunk |   .0003037   .0002702     1.12   0.265    -.0002354    .0008427
      weight |   .0000121   3.90e-06     3.11   0.003     4.35e-06    .0000199
      length |   .0000137   .0001189     0.12   0.909    -.0002235    .0002509
displacement |   4.31e-06   .0000194     0.22   0.825    -.0000344     .000043
       _cons |   .0059957    .012773     0.47   0.640    -.0194857    .0314771
------------------------------------------------------------------------------

Stata users will, or rather should, recognise regression output for the auto dataset. Naturally none of the commentary here is intrinsic or specific to Stata.

If we look at correlations for the predictors with gpm, here presented in terms of correlations and 95% confidence intervals, we see that all correlations between individual predictors and gpm are significant at the 5% level; in fact stronger statements could be made.

                            correlations and 95% limits
trunk        gpm               0.632    0.472    0.752
weight       gpm               0.854    0.778    0.906
length       gpm               0.820    0.727    0.883
displacement gpm               0.771    0.659    0.850

It is easy to reconcile these two findings. The correlations pay absolutely no attention to any other variables except the two named. (There are ways of taking other variables into account, notably partial correlation, but we haven't done that.) The regression on the other hand is a team effort and each coefficient depends not only on the associated predictor, but also on the other predictors. The way it shapes out here is that the predictors are strongly correlated with each other, but weight looks like the best predictor, and given that weight is in the equation, the other predictors cannot add much.

In a real problem, you should always look at the entire correlation matrix to check the relationships among the predictors; the corresponding scatter plot matrix; and various diagnostic plots.

Only when the predictors are uncorrelated with each other will the effects of all the predictors be the sum of the effects of individual predictors. If you have that situation, it is often bad news, not good, as it means your data are just noise. Absent some experimental design intended to secure independence, moderate if not strong relationships among the predictors are as much to be expected as moderate to strong relationships between the predictors and the response variable.

Related Solutions

Regression – Can Interaction Term of Two Insignificant Coefficients be Significant?

$A*B$ can be significant in all of these scenarios. Consider $A \in \{-1, 0, 1\}$ and $B \in \{-1, 1\}$ where the underlying model is $E[Y|A,B] = A*B$. In a roughly balanced situation, with (roughly) equal sample sizes for each combination of $A \times B$, neither $A$ nor $B$ will be significant (except for the $\alpha$ fraction of the time when a true null hypothesis is rejected), but the interaction term certainly will be! Here's a numeric example:

A <- rep(c(-1,0,1), 100)
B <- rep(c(-1,1), 150)
X <- A*B
Y <- X + rnorm(300)

> summary(lm(Y~A+B+A*B))

Call:
lm(formula = Y ~ A + B + A * B)

Residuals:
     Min       1Q   Median       3Q      Max 
-3.03520 -0.59349 -0.03184  0.62857  2.49359 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.02083    0.05668  -0.367    0.714    
A           -0.03797    0.06942  -0.547    0.585    
B            0.05867    0.05668   1.035    0.301    
A:B          0.90789    0.06942  13.078   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.9818 on 296 degrees of freedom
Multiple R-squared: 0.3681, Adjusted R-squared: 0.3617 
F-statistic: 57.47 on 3 and 296 DF,  p-value: < 2.2e-16

Or, more simply:

> cor(A,Y)
[1] -0.02527534
> cor(B,Y)
[1] 0.04782935
> cor(A*B,Y)
[1] 0.6042723

It should be intuitively clear that if we can construct an example where $A$ and $B$ are both insignificant, yet the interaction is significant, we can do so for either of your other two cases.

As for likely... One could argue that in real life, apart from physics and a few other disciplines, pretty much all interaction terms are very likely to be nonzero (albeit perhaps very small), and "significance" in its statistical sense is merely a function of sample size.

Solved – Positive correlations to dependent variable, but negative coefficients

You have several issues.

For one, these three coefficients are not significant, basically, indistinguishable from zeros at 95% confidence (see the t-stats).

Second, two of your correlations are also not significant, i.e. indistinguishable from zeros.

Third, your correlations are unconditional, i.e. they do not take into account what's going on with other variables. Imagine this, you have two variables: age and sex (1-male, 0- female). Your dependent is salary. So, you compute correlation of salary and sex, and it comes negative. It surprises you.

So, you run a regression of salary ~ sex + age. The coefficient on sex comes positive as expected. What's the matter? It turns out in your sample male were younger in average. So, when you run a regression and controlled for age, the sex coefficient came out right.

Best Answer

Related Solutions

Regression – Can Interaction Term of Two Insignificant Coefficients be Significant?

Solved – Positive correlations to dependent variable, but negative coefficients

Related Question