In order to make sure that I can use parametric test, I need to make sure that my residual distribution is normal.
There is really no way to demonstrate that you have exact normality, but that's okay because approximate normality will generally be sufficient for hypothesis tests in regression to work the way you want.
However, when I refer to the value of skewness and kurtosis of the residual, it is -0.017 and -0.438 respectively, where i think this is considered as normal.
You can obtain values like that with residuals from a simple regression on normal data, but the kurtosis is just significant at the 5% level.
(Technical aside: I used simulation to assess the significance of the kurtosis of residuals here; not knowing the number of predictors, I did it for both independent normals and for one predictor at the given sample size, both showed essentially the same p-value; results should be similar for regression with small numbers of predictors.)
This doesn't actually suggest a problem with the inference when doing a regression or correlation, however. Your data won't be exactly normal; the essential question is 'are the data so badly non-normal that the inference no longer has the properties you wish?'
Unfortunately, when i do kolmogorov-smirnov, the significant value is 0.021, which indicates the residual is not normal.
What were the specified population mean and variance of the residuals for your KS test and how did you get such population values?
Could anybody please explain to me what to do.
I suggest you don't do a hypothesis test to assess the suitability of the assumption of normality, but instead to look at diagnostic displays that show you how badly non-normal the data are.
Some pointers -
See the points here
Also see the discussion on this question
See the comments under this answer,
and the advice in this answer
Consider this advice
The difficulty is that skewness and kurtosis are dependent; their effects can't be completely separated.
The problem is that if you want to examine the effect of a highly skew distribution, you must also have a distribution with high kurtosis.
In particular, kurtosis* $\geq$ skewness$^2+1$.
* (ordinary scaled fourth moment kurtosis, not excess kurtosis)
Khan and Rayner (which is mentioned in the earlier answer) work with a family that allows some exploration of the impact of skewness and kurtosis, but they cannot avoid this issue, so their attempt to separate them severely limits the extent to which the effect of skewness can be explored.
If one holds the kurtosis ($\beta_2$) constant, one cannot make the skewness more than $\sqrt{\beta_2-1}$. If one wishes to consider unimodal distributions, the skewness is even more restricted.
For example, if you want to see the effect of high skewness - say skewness > 5, you cannot get a distribution with kurtosis less than 26!
So if you want to investigate the impact of high skewness, you are unable to avoid investigating the impact of high kurtosis. Consequently if you do try to separate them, you in effect hold yourself unable to assess the effect of increasing skewness to high levels.
That said, at least for the distribution family they considered, and within the limits that the relationship between them poses, the investigation by Khan and Rayner does seem to suggest that kurtosis is the main problem.
However, even if the conclusion is completely general, if you happen to have a distribution with (say) skewness 5, it's likely to be little comfort to say "but it's not the skewness that's the problem!" -- once your skewness is $>\sqrt{2}$, you can't get a kurtosis to be that of the normal, and beyond that, minimum possible kurtosis grows rapidly with increasing skewness.
Best Answer
This can be done using the sinh-arcsinh transformation from
The transformation is defined as
$$H(x;\epsilon,\delta)=\sinh[\delta\sinh^{-1}(x)-\epsilon], \tag{$\star$}$$
where $\epsilon \in{\mathbb R}$ and $\delta \in {\mathbb R}_+$. When this transformation is applied to the normal CDF $S(x;\epsilon,\delta)=\Phi[H(x;\epsilon,\delta)]$, it produces a unimodal distribution whose parameters $(\epsilon,\delta)$ control skewness and kurtosis, respectively (Jones and Pewsey, 2009), in the sense of van Zwet (1969). In addition, if $\epsilon=0$ and $\delta=1$, we obtain the original normal distribution. See the following R code.
Therefore, by choosing an appropriate sequence of parameters $(\epsilon_n,\delta_n)$, you can generate a sequence of distributions/transformations with different levels of skewness and kurtosis and make them look as similar or as different to the normal distribution as you want.
The following plot shows the outcome produced by the R code. For (i) $\epsilon=(-2,-1,0,1,2)$ and $\delta=1$, and (ii) $\epsilon=0$ and $\delta=(0.5,0.75,1,1.25,1.5)$.
Simulation of this distribution is straightforward given that you just have to transform a normal sample using the inverse of $(\star)$.
$$H^{-1}(x;\epsilon,\delta)=\sinh[\delta^{-1}(\sinh^{-1}(x)+\epsilon)]$$