This issue seems to rear its ugly head all the time, and I'm trying to decapitate it for my own understanding of statistics (and sanity!).
The assumptions of general linear models (t-test, ANOVA, regression etc.) include the "assumption of normality", but I have found this is rarely described clearly.
I often come across statistics textbooks / manuals / etc. simply stating that the "assumption of normality" applies to each group (i.e., categorical X variables), and we should we examining departures from normality for each group.
Questions:
-
does the assumption refer to the values of Y or the residuals of Y?
-
for a particular group, is it possible to have a strongly non-normal distribution of Y values (e.g., skewed) BUT an approximately (or at least more normal) distribution of residuals of Y?
Other sources describe that the assumption pertains to the residuals of the model (in cases where there are groups, e.g. t-tests / ANOVA), and we should be examining departures of normality of these residuals (i.e., only one Q-Q plot/test to run).
-
does normality of residuals for the model imply normality of residuals for the groups? In other words, should we just examine the model residuals (contrary to instructions in many texts)?
To put this in a context, consider this hypothetical example:
- I want to compare tree height (Y) between two populations (X).
- In one population the distribution of Y is strongly right-skewed (i.e.,
most trees short, very few tall), while the other is virtually normal - Height is higher overall in the normally distributed population (suggesting there may be a 'real' difference).
- Transformation of the data does not substantially improve the distribution of the first population.
-
Firstly, is it valid to compare the groups given the radically different height distributions?
-
How do I approach the "assumption of normality" here? Recall height in one population is not normally distributed. Do I examine residuals for both populations separately OR residuals for the model (t-test)?
Please refer to questions by number in replies, experience has shown me people get lost or sidetracked easily (especially me!). Keep in mind I am not a statistician; though I have a reasonably conceptual (i.e., not technical!) understanding of statistics.
P.S., I have searched the archives and read the following threads which have not cemented my understanding:
- ANOVA assumption normality/normal distribution of residuals
- Normality of residuals vs sample data; what about t-tests?
- Is normality testing 'essentially useless'?
- Testing normality
- Assessing normality of distribution
- What tests do I use to confirm that residuals are normally distributed?
- What to do when Kolmogorov-Smirnov test is significant for residuals of parametric test but skewness and kurtosis look normal?
Best Answer
One point that may help your understanding:
If $x$ is normally distributed and $a$ and $b$ are constants, then $y=\frac{x-a}{b}$ is also normally distributed (but with a possibly different mean and variance).
Since the residuals are just the y values minus the estimated mean (standardized residuals are also divided by an estimate of the standard error) then if the y values are normally distributed then the residuals are as well and the other way around. So when we talk about theory or assumptions it does not matter which we talk about because one implies the other.
So for the questions this leads to:
Another point that is important to understand (but is often conflated in learning) is that there are 2 types of residuals here: The theoretical residuals which are the differences between the observed values and the true theoretical model, and the observed residuals which are the differences between the observed values and the estimates from the currently fitted model. We assume that the theoretical residuals are iid normal. The observed residuals are not i, i, or distributed normal (but do have a mean of 0). However, for practical purposes the observed residuals do estimate the theoretical residuals and are therefore still useful for diagnostics.