The sample skewness $$\gamma=\frac{\sum_{i=1}^n(x_i-\bar{x})^3}{\Big(\sum_{i=1}^n(x_i-\bar{x})^2\Big)^{3/2}}$$ and the sample (excess) kurtosis $$\kappa=\frac{\sum_{i=1}^n(x_i-\bar{x})^4}{\Big(\sum_{i=1}^n(x_i-\bar{x})^2\Big)^{2}}-3$$ are often used as measures of non-normality.
The sample skewness measures the asymmetry of the empirical distribution. If it is far from $0$, the distribution is not very symmetric. Since the normal distribution is symmetric, a sample from the normal distribution should be close to $0$.
The sample kurtosis measures the "peakedness" of the distribution. If it is much greater than $0$, then the distribution is more peaked than the normal distribution, which typically means that it has heavier tails. If it is less than $0$ it is less peaked, which typically means that the distribution is bimodal. The sample kurtosis is bounded from below by $-2$ (a value that is obtained for a two-point distribution, which of course is extremely bimodal).!
Here are two examples (normal distribution in grey, other distributions in red):
The skew distribution has theoretical skewness $1.6$ whereas the kurtotic distributions has theoretical (excess) kurtosis $1.5$. As you can see, the kurtotic distribution has heavier tails than the normal distribution.
So, why use skewness and kurtosis as quantifications of non-normality? The main reason is that they affect the asymptotics of the central limit theorem, which as you may know often can be used to motivate the use of a statistical procedure (that is based on normality) even if the data does not come from a normal distribution, given that you have a "large enough" sample. If either the skewness or the kurtosis is high, larger sample sizes are needed for such motivations to be valid.
For some inferential procedures you need to worry more about skewness, and for some you need to worry about heavy tails (kurtosis). I've written more about that elsewhere on this site.
In order to make sure that I can use parametric test, I need to make sure that my residual distribution is normal.
There is really no way to demonstrate that you have exact normality, but that's okay because approximate normality will generally be sufficient for hypothesis tests in regression to work the way you want.
However, when I refer to the value of skewness and kurtosis of the residual, it is -0.017 and -0.438 respectively, where i think this is considered as normal.
You can obtain values like that with residuals from a simple regression on normal data, but the kurtosis is just significant at the 5% level.
(Technical aside: I used simulation to assess the significance of the kurtosis of residuals here; not knowing the number of predictors, I did it for both independent normals and for one predictor at the given sample size, both showed essentially the same p-value; results should be similar for regression with small numbers of predictors.)
This doesn't actually suggest a problem with the inference when doing a regression or correlation, however. Your data won't be exactly normal; the essential question is 'are the data so badly non-normal that the inference no longer has the properties you wish?'
Unfortunately, when i do kolmogorov-smirnov, the significant value is 0.021, which indicates the residual is not normal.
What were the specified population mean and variance of the residuals for your KS test and how did you get such population values?
Could anybody please explain to me what to do.
I suggest you don't do a hypothesis test to assess the suitability of the assumption of normality, but instead to look at diagnostic displays that show you how badly non-normal the data are.
Some pointers -
See the points here
Also see the discussion on this question
See the comments under this answer,
and the advice in this answer
Consider this advice
Best Answer
Consider also whether a transformation might bring the variable closer to normality. But normality is overrated.