I'm a psychology graduate, so I admit that statistics do not come naturally to me. However, I find them fascinating nonetheless.
At the moment i'm struggling with regressions, or specifically in this instance a multiple linear regression.
I was just curious about the relationship between correlations and coefficients. In my assignment we've been asked to look at some data pertaining to stress (all made up, of course!) and the results indicate that several of the factors correlate to one another (e.g. assignment anxiety and social coping skills),as would be expected. However, only a single coefficient is significant in relation to the outcome variable (stress).
I'm sure this is probably a ludicrously easy thing to understand but as no one has explained it to me I feel lost about how these two things (correlations and coefficients) are related (if at all they are).
Best Answer
Here are some results from a regression for 74 cars of
gpm
(gallons per mile) as a function oftrunk
,weight
,length
anddisplacement
, which are all measures of size of cars. Only one predictor achieves significance at conventional levels, although its P-value is pleasingly small.Stata users will, or rather should, recognise regression output for the
auto
dataset. Naturally none of the commentary here is intrinsic or specific to Stata.If we look at correlations for the predictors with
gpm
, here presented in terms of correlations and 95% confidence intervals, we see that all correlations between individual predictors andgpm
are significant at the 5% level; in fact stronger statements could be made.It is easy to reconcile these two findings. The correlations pay absolutely no attention to any other variables except the two named. (There are ways of taking other variables into account, notably partial correlation, but we haven't done that.) The regression on the other hand is a team effort and each coefficient depends not only on the associated predictor, but also on the other predictors. The way it shapes out here is that the predictors are strongly correlated with each other, but
weight
looks like the best predictor, and given thatweight
is in the equation, the other predictors cannot add much.In a real problem, you should always look at the entire correlation matrix to check the relationships among the predictors; the corresponding scatter plot matrix; and various diagnostic plots.
Only when the predictors are uncorrelated with each other will the effects of all the predictors be the sum of the effects of individual predictors. If you have that situation, it is often bad news, not good, as it means your data are just noise. Absent some experimental design intended to secure independence, moderate if not strong relationships among the predictors are as much to be expected as moderate to strong relationships between the predictors and the response variable.