Solved – Testing assumptions of multiple regression

heteroscedasticitymultiple regression

  1. When doing a multiple regression and testing for homoscedasticity
    some people look at raw observations and others the residuals. Which
    is correct?

  2. Do you use raw data or residuals to test linearity?

  3. Do you test the homoscedasticity for each IV against the DV or do
    you put all IVs in at the same time and then test for
    homoscedasticity?

  4. When do you test the assumptions before running the analysis, after, or both?

  5. What order do you do these things? Do you do any twice?

    • test for linearity
    • test for normal distribution
    • test for equal variances
    • run the multiple linear regression

Best Answer

The answers mostly derive from considering the question 'what is actually being assumed?'.

Do you know the actual assumptions?

(Note that the distributional assumptions are conditional, not marginal.)

1 When doing a multiple regression and testing for homoscedasticity some people look at raw observations and others the residuals. Which is correct?

What's the actual assumption here?

2 Do you use raw data or residuals to test linearity?

Which shows deviations from the model assumptions best?

3 Do you test the homoscedasticity for each IV against the DV or do you put all IVs in at the same time and then test for homoscedasticity?

See (1)

4 When do you test the assumptions before running the analysis, after, or both?

What exactly do you mean by 'running the analysis' here?

(If you use residuals, how would you do it before doing the calculations?)

If you mean 'before/after doing the formal inference based off the model fit', I'd normally say 'notionally before', but in what actual way would the order make a difference?

5 What order do you do these things?

This question is confusing. The last part:

test for linearity test for normal distribution test for equal variances run the multiple linear regression .

should have been right after the word 'things', like so:

5 What order do you do these things (check for linearity; check for normal distribution; check for equal variances; run the multiple linear regression)?

Again, if you use residuals for anything, how would you check (NB check, not test) those assumptions before calculating the residuals?

You can't check the assumption relating to conditional variance if linearity doesn't hold.

You can't check the assumption relating to normality if homoscedasticity doesn't hold.

Linearity is the basic assumption ('is my model for the mean appropriate?').

Variance is the next most important, and can't be checked until linearity is at least approximately satisfied

Normality is least important (if sample sizes aren't small... unless you're producing prediction intervals - then it matters even at large sample sizes) and can't be checked unless your data is at least approximately homoscedastic.

Do you do any twice?

Only where it would make a difference to do so.

Related Question