I think Question 1 and 2 are interconnected. First, the homogeneity of variance assumption comes from here, $\boldsymbol \epsilon \ \sim \ N(\mathbf{0, \sigma^2 I})$. But this assumption can be relaxed to more general variance structures, in which the homogeneity assumption is not necessary. That means it really depends on how the distribution of $\boldsymbol \epsilon$ is assumed.
Second, the conditional residuals are used to check the distribution of (thus any assumptions related to) $\boldsymbol \epsilon$, whereas the marginal residuals can be used to check the total variance structure.
It is important to check for residuals rather than normality of
the collection of all responses.
Mixture of normal observations need not be normal. I will give an illustration with $g = 4$ groups and $n = 10$ replications
in each group. Data are simulated as normal with several different means
and equal variances, but a Shapiro-Wilk test rejects normality for the
$gn = 40$ observations taken together.
set.seed(1234)
g = 4; n = 10
x1 = rnorm(10, 20, 5); x2 = rnorm(10, 25, 5)
x3 = rnorm(10, 35, 5); x4 = rnorm(10, 50, 5)
x = c(x1, x2, x3, x4)
shapiro.test(x)
Shapiro-Wilk normality test
data: x
W = 0.93777, p-value = 0.0291
Taken together the 40 observations have a normal mixture distribution, which
need not be normal. Perhaps Wikipedia on mixture distributions, especially the figure
near the top of the page.
Looking at residuals. However, for this simple model, the residuals are found by subtracting
the mean for each group from each observation in the group. The 40 residuals
pass the Shapiro-Wilk test.
r1 = x1 - mean(x1); r2 = x2 - mean(x2)
r3 = x3 - mean(x3); r4 = x4 - mean(x4)
r = c(r1, r2, r3, r4)
shapiro.test(r)
Shapiro-Wilk normality test
data: r
W = 0.98231, p-value = 0.7743
ANOVA Significant. Because the group population means are quite different,
a one-way ANOVA on my fake data shows a highly significant effect.
gp = as.factor(rep(1:4, each=10))
lm.out = lm(x ~ gp); anova(lm.out)
Analysis of Variance Table
Response: x
Df Sum Sq Mean Sq F value Pr(>F)
gp 3 5655.9 1885.31 62.167 2.596e-14 ***
Residuals 36 1091.8 30.33
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
It is precisely in the cases where there is a significant effect that
the aggregated data from all groups are likely to
fail the Shapiro-Wilk normality test.
Best Answer
If I understand correctly, you have one predictor (explanatory variable $x$) and one criterion (predicted variable $y$) in a simple linear regression. The significance tests rests on the model assumption that for each observation $i$ $$ y_{i} = \beta_{0} + \beta_{1} x_{i} + \epsilon_{i} $$ where $\beta_{0}, \beta_{1}$ are the parameters we want to estimate and test hypotheses about, and the errors $\epsilon_{i} \sim N(0, \sigma^{2})$ are normally-distributed random variables with mean 0 and constant variance $\sigma^{2}$. All $\epsilon_{i}$ are assumed to be independent of each other, and of the $x_{i}$. The $x_{i}$ themselves are assumed to be error free.
You used the term "homogeneity of variances" which is typically used when you have distinct groups (as in ANOVA), i.e., when the $x_{i}$ only take on a few distinct values. In the context of regression, where $x$ is continuous, the assumption that the error variance is $\sigma^{2}$ everywhere is called homoscedasticity. This means that all conditional error distributions have the same variance. This assumption cannot be tested with a test for distinct groups (Fligner-Killeen, Levene).
The following diagram tries to illustrate the idea of identical conditional error distributions (R-code here).
Tests for heteroscedasticity are the Breusch-Pagan-Godfrey-Test (
bptest()
from packagelmtest
orncvTest()
from packagecar
) or the White-Test (white.test()
from packagetseries
). You can also consider just using heteroscedasticity-consistent standard errors (modified White estimator, see functionhccm()
from packagecar
orvcovHC()
from packagesandwich
). These standard errors can then be used in combination with functioncoeftest()
from packagelmtest()
, as described on page 184-186 in Fox & Weisberg (2011), An R Companion to Applied Regression.You could also just plot the empirical residuals (or some transform thereof) against the fitted values. Typical transforms are the studentized residuals (spread-level-plot) or the square-root of the absolute residuals (scale-location-plot). These plots should not reveal an obvious trend of residual distribution that depends on the prediction.