Solved – How to test for normality in a 2×2 ANOVA

anovaassumptionsnormality-assumptionresidualsspss

Study Design: I showed participants some information about sea-level rise, focusing the information in different ways, both in terms of the time-scale and the magnitude of potential rise. Thus I had a 2 (Time: 2050 or 2100) by 2 (Magnitude: Medium or High) design. There were also two control groups who received no information, only answering the questions for my DVs.

Questions:
I've always checked for normality within cells — for the 2×2 portion of this design, it would mean looking for normality within 4 groups. However, reading some discussions here has made me second guess my methods.

First, I've read that I should be looking at the normality of the residuals. How can I check for normality of residuals (in SPSS or elsewhere)? Do I have to do this for each of the 4 groups (6 including the controls)?

I also read that normality within groups implies normality of the residuals. Is this true? (Literature references?) Again, does this mean looking at each of the 4 cells separately?

In short, what steps would you take to determine whether your (2×2) data are not violating assumptions of normality?

References are always appreciated, even if just to point me in the right direction.

Best Answer

Most statistics packages have ways of saving residuals from your model. Using GLM - UNIVARIATE in SPSS you can save residuals. This will add a variable to your data file representing the residual for each observation.

Once you have your residuals you can then examine them to see whether they are normally distributed, homoscedastic, and so on. For example, you could use a formal normality test on your residual variable or perhaps more appropriately, you could plot the residuals to check for any major departures from normality. If you want to examine homoscedasticity, you could get a plot that looked at the residuals by group.

For a basic between subjects factorial ANOVA, where homogeneity of variance holds, normality within cells means normality of residuals because your model in ANOVA is to predict group means. Thus, the residual is just the difference between group means and observed data.

Response to comments below:

  • Residuals are defined relative to your model predictions. In this case your model predictions are your cell means. It is a more generalisable way of thinking about assumption testing if you focus on plotting the residuals rather than plotting individual cell means, even if in this particular case, they are basically the same. For example, if you add a covariate (ANCOVA), residuals would be more appropriate to examine than distributions within cells.
  • For purposes of examining normality, standardised and unstandardised residuals will provide the same answer. Standardised residuals can be useful when you are trying to identify data that is poorly modelled by the data (i.e., an outlier).
  • Homogeneity of variance and homoscedasticity mean the same thing as far as I'm aware. Once again, it is common to examine this assumption by comparing the variances across groups/cells. In your case, whether you calculate variance in residuals for each cell or based on the raw data in each cell, you will get the same values. However, you can also plot residuals on the y-axis and predicted values on the x-axis. This is a more generalisable approach as it is also applicable to other situations such as where you add covariates or you are doing multiple regression.
  • A point was raised below that when you have heteroscedasticity (i.e., within cell variance varies between cells in the population) and normally distributed residuals within cells, the resulting distribution of all residuals would be non-normal. The result would be a mixture distribution of variables with mean of zero and different variances with proportions relative to cell sizes. The resulting distribution will have no zero skew, but would presumably have some amount of kurtosis. If you divide residuals by their corresponding within-cell standard deviation, then you could remove the effect heteroscedasticity; plotting the residuals that result would provide an overall test of whether residuals are normally distributed independent of any heteroscedasticity.