Solved – Transformation to increase kurtosis and skewness of normal r.v

data transformationkurtosisnormality-assumptionskewness

I'm working on an algorithm that relies on the fact that observations $Y$s are normally distributed, and I would like to test the robustness of the algorithm to this assumption empirically.

To do this, I was looking for a sequence of transformations $T_1(), \dots, T_n()$ that would progressively disrupt the normality of $Y$. For example if the $Y$s are normal they have skewness $= 0$ and kurtosis $= 3$, and it would be nice to find a sequence of transformation that progressively increase both.

My idea was to simulate some normally approximately distributed data $Y$ and test the algorithm on that. Than test algorithm on each transformed dataset $T_1(Y), \dots, T_n(y)$, to see how much the output is changing.

Notice that I don't control the distribution of the simulated $Y$s, so I cannot simulate them using a distribution that generalizes the Normal (such as the Skewed Generalized Error Distribution).

Best Answer

This can be done using the sinh-arcsinh transformation from

Jones, M. C. and Pewsey A. (2009). Sinh-arcsinh distributions. Biometrika 96: 761–780.

The transformation is defined as

$$H(x;\epsilon,\delta)=\sinh[\delta\sinh^{-1}(x)-\epsilon], \tag{$\star$}$$

where $\epsilon \in{\mathbb R}$ and $\delta \in {\mathbb R}_+$. When this transformation is applied to the normal CDF $S(x;\epsilon,\delta)=\Phi[H(x;\epsilon,\delta)]$, it produces a unimodal distribution whose parameters $(\epsilon,\delta)$ control skewness and kurtosis, respectively (Jones and Pewsey, 2009), in the sense of van Zwet (1969). In addition, if $\epsilon=0$ and $\delta=1$, we obtain the original normal distribution. See the following R code.

fs = function(x,epsilon,delta) dnorm(sinh(delta*asinh(x)-epsilon))*delta*cosh(delta*asinh(x)-epsilon)/sqrt(1+x^2)

vec = seq(-15,15,0.001)

plot(vec,fs(vec,0,1),type="l")
points(vec,fs(vec,1,1),type="l",col="red")
points(vec,fs(vec,2,1),type="l",col="blue")
points(vec,fs(vec,-1,1),type="l",col="red")
points(vec,fs(vec,-2,1),type="l",col="blue")

vec = seq(-5,5,0.001)

plot(vec,fs(vec,0,0.5),type="l",ylim=c(0,1))
points(vec,fs(vec,0,0.75),type="l",col="red")
points(vec,fs(vec,0,1),type="l",col="blue")
points(vec,fs(vec,0,1.25),type="l",col="red")
points(vec,fs(vec,0,1.5),type="l",col="blue")

Therefore, by choosing an appropriate sequence of parameters $(\epsilon_n,\delta_n)$, you can generate a sequence of distributions/transformations with different levels of skewness and kurtosis and make them look as similar or as different to the normal distribution as you want.

The following plot shows the outcome produced by the R code. For (i) $\epsilon=(-2,-1,0,1,2)$ and $\delta=1$, and (ii) $\epsilon=0$ and $\delta=(0.5,0.75,1,1.25,1.5)$.

enter image description here

enter image description here

Simulation of this distribution is straightforward given that you just have to transform a normal sample using the inverse of $(\star)$.

$$H^{-1}(x;\epsilon,\delta)=\sinh[\delta^{-1}(\sinh^{-1}(x)+\epsilon)]$$