Solved – Departure from normality assumption in ANOVA: is kurtosis or skewness more important

anovakurtosisnormality-assumptionskewness

Applied linear statistical models by Kutner et al. states the following concerning departures from the normality assumption of ANOVA models: Kurtosis of the error distribution (either more or less peaked than a normal distribution) is more important than skewness of the distribution in terms of the effects on inferences.

I'm a bit puzzled by this statement and did not manage to find any related information, either in the book or online. I'm confused because I also learned that QQ-plots with heavy tails are an indication that the normality assumption is "good enough" for linear regression models, whereas skewed QQ-plots are more of a concern (i.e. a transformation might be appropriate).

Am I correct that the same reasoning goes for ANOVA and that their choice of words (more important in terms of the effects on inferences) was just chosen poorly? I.e. a skewed distribution has more severe consequences and should be avoided, whereas a small amount of kurtosis can be acceptable.

EDIT: As adressed by rolando2, it's hard to state that one is more important than the other in all cases, but I'm merely looking for some general insight. My main issue is that I was taught that in simple linear regression, QQ-plots with heavier tails (=kurtosis?) are OK, since the F-test is quite robust against this. On the other hand, skewed QQ-plots (parabola-shaped) are usually a bigger concern. This seems to go directly against the guidelines my textbook provides for ANOVA, even though ANOVA models can be converted to regression models and should have the same assumptions.

I'm convinced I'm overlooking something or I have a false assumption, but I cannot figure out what it might be.

Best Answer

The difficulty is that skewness and kurtosis are dependent; their effects can't be completely separated.

The problem is that if you want to examine the effect of a highly skew distribution, you must also have a distribution with high kurtosis.

In particular, kurtosis* $\geq$ skewness$^2+1$.

* (ordinary scaled fourth moment kurtosis, not excess kurtosis)

Khan and Rayner (which is mentioned in the earlier answer) work with a family that allows some exploration of the impact of skewness and kurtosis, but they cannot avoid this issue, so their attempt to separate them severely limits the extent to which the effect of skewness can be explored.

If one holds the kurtosis ($\beta_2$) constant, one cannot make the skewness more than $\sqrt{\beta_2-1}$. If one wishes to consider unimodal distributions, the skewness is even more restricted.

For example, if you want to see the effect of high skewness - say skewness > 5, you cannot get a distribution with kurtosis less than 26!

So if you want to investigate the impact of high skewness, you are unable to avoid investigating the impact of high kurtosis. Consequently if you do try to separate them, you in effect hold yourself unable to assess the effect of increasing skewness to high levels.

That said, at least for the distribution family they considered, and within the limits that the relationship between them poses, the investigation by Khan and Rayner does seem to suggest that kurtosis is the main problem.

However, even if the conclusion is completely general, if you happen to have a distribution with (say) skewness 5, it's likely to be little comfort to say "but it's not the skewness that's the problem!" -- once your skewness is $>\sqrt{2}$, you can't get a kurtosis to be that of the normal, and beyond that, minimum possible kurtosis grows rapidly with increasing skewness.

Related Solutions

Solved – Transformation to increase kurtosis and skewness of normal r.v

This can be done using the sinh-arcsinh transformation from

Jones, M. C. and Pewsey A. (2009). Sinh-arcsinh distributions. Biometrika 96: 761–780.

The transformation is defined as

$$H(x;\epsilon,\delta)=\sinh[\delta\sinh^{-1}(x)-\epsilon], \tag{$\star$}$$

where $\epsilon \in{\mathbb R}$ and $\delta \in {\mathbb R}_+$. When this transformation is applied to the normal CDF $S(x;\epsilon,\delta)=\Phi[H(x;\epsilon,\delta)]$, it produces a unimodal distribution whose parameters $(\epsilon,\delta)$ control skewness and kurtosis, respectively (Jones and Pewsey, 2009), in the sense of van Zwet (1969). In addition, if $\epsilon=0$ and $\delta=1$, we obtain the original normal distribution. See the following R code.

fs = function(x,epsilon,delta) dnorm(sinh(delta*asinh(x)-epsilon))*delta*cosh(delta*asinh(x)-epsilon)/sqrt(1+x^2)

vec = seq(-15,15,0.001)

plot(vec,fs(vec,0,1),type="l")
points(vec,fs(vec,1,1),type="l",col="red")
points(vec,fs(vec,2,1),type="l",col="blue")
points(vec,fs(vec,-1,1),type="l",col="red")
points(vec,fs(vec,-2,1),type="l",col="blue")

vec = seq(-5,5,0.001)

plot(vec,fs(vec,0,0.5),type="l",ylim=c(0,1))
points(vec,fs(vec,0,0.75),type="l",col="red")
points(vec,fs(vec,0,1),type="l",col="blue")
points(vec,fs(vec,0,1.25),type="l",col="red")
points(vec,fs(vec,0,1.5),type="l",col="blue")

Therefore, by choosing an appropriate sequence of parameters $(\epsilon_n,\delta_n)$, you can generate a sequence of distributions/transformations with different levels of skewness and kurtosis and make them look as similar or as different to the normal distribution as you want.

The following plot shows the outcome produced by the R code. For (i) $\epsilon=(-2,-1,0,1,2)$ and $\delta=1$, and (ii) $\epsilon=0$ and $\delta=(0.5,0.75,1,1.25,1.5)$.

enter image description here

Simulation of this distribution is straightforward given that you just have to transform a normal sample using the inverse of $(\star)$.

$$H^{-1}(x;\epsilon,\delta)=\sinh[\delta^{-1}(\sinh^{-1}(x)+\epsilon)]$$

Solved – Incorrect Kurtosis, Skewness and coefficient Bimodality values

I agree with @NickCox : I think the mistake is in the first line of your post, where you define "bimodality coefficient". I Googled and found Pfister et al (which references SAS/STAT from 1990). That paper indicates problems with BC that are quite similar to the ones you found and recommends Hartigan's dip test, instead of BC (or in addition to it). The dip test is available in R through the diptest package. In addition, the kurtosis in the formula is supposed to be excess kurtosis and you appear to not have adjusted for that (although I am not certain of this)

The SAS documentation also mentions problems with BC, in particular

Very heavy-tailed distributions have small values of regardless of the number of modes.

The long tail of your second distribution is probably lowering the value of BC.

In short, the problem is in the formula, not in your code. There is, as far as I know, no perfect measure of the number of modes.

Best Answer

Related Solutions

Solved – Transformation to increase kurtosis and skewness of normal r.v

Solved – Incorrect Kurtosis, Skewness and coefficient Bimodality values

Related Question