MATLAB: Kstest – normal

confusionkstestMATLABnormalnormalityStatistics and Machine Learning Toolbox

Hi, I am confused from reading the description from the 'kstest' function. Usually '1' means true and '0' means false, and the purpose of this function is to test whether or not a set of data is normally distributed. However, what I gather from reading the description, '0' is returned when the data is normally distributed, and '1' is returned when the data is not normally distributed.

Is this correct interpretation? The example is also a little confusing x = -2:1:4 x = -2 -1 0 1 2 3 4

[h,p,k,c] = kstest(x,[],0.05,0)
h =
   0
p =
   0.13632
k =
   0.41277
c =
   0.48342

These data are linear, not a normal distribution. Yet the kstest returns '0', which means the kstest classifies these data as normal, which is a limitation of the kstest with small data samples?

From what I read, the resolution is thus to use the 'smaller' or 'larger' tag to correct for this problem, but is there any clear cut-off for what is 'smaller' and what is 'larger'?

Lastly, if I were to use this test in a publication and say that our data was 'normal' (this function returned 0) or failed to be classified as 'normal' (this function returned 1) with this test and I used the 'smaller' or 'larger' tags, how does that change the name of the test? It can't be the same test if it is returning different values. How would I explain this?

Best Answer

Your example (taken from the documentation), "illustrates the difficulty of testing normality in small samples." If you plot

normplot(x)

you'll see that the deviations from a standard normal distribution occur in the two outer points. It doesn't take a lot more data to get a reasonable result, though:

x = -2:0.5:4;
[h,p,k,c] = kstest(x,[],0.05,0)
h =
     1

p =

    0.0245

k =

    0.3947

c =

    0.3614

Keep in mind, too, their comment about the Lilliefors test - it is more likely to be the one you want.

Related Solutions

MATLAB: Kstest and hypothesis rejecting

When in doubt, use help()

help kstest
 kstest Single sample Kolmogorov-Smirnov goodness-of-fit hypothesis test.
    H = kstest(X) performs a Kolmogorov-Smirnov (K-S) test to determine if
    a random sample X could have come from a standard normal distribution,
    N(0,1). H indicates the result of the hypothesis test:
       H = 0 => Do not reject the null hypothesis at the 5% significance

       level. 
       H = 1 => Reject the null hypothesis at the 5% significance
       level.

The null hypothesis depends on your inputs. Again, from the documentation using doc(ktest):

Test the null hypothesis that the data comes from a normal distribution with 
a mean of 75 and a standard deviation of 10. Use these parameters to center 
and scale each element of the data vector since, by default, kstest tests for 
a standard normal distribution.
x = (test1-75)/10;
h = kstest(x)

MATLAB: How to go about finding the standard normal probability based on the z-score

doc normcdf
doc normpdf

When you know what you want but not sure the name, try something like

>> lookfor normal
realmin                        - Smallest positive normalized floating point number.
randn                          - Normally distributed pseudorandom numbers.
sprandn                        - Sparse normally distributed random matrix.
surfnorm                       - Surface normals.
isonormals                     - Isosurface normals.
cde                            - cd elliptic function with normalized complex argument.
sne                            - sn elliptic function with normalized complex argument.
addfreqcsmenu                  - Add a cs menu to switch between linear and normalized frequency
convertfrequnits               - converts between Normalized, Hz, kHz, etc
histfit                        - Histogram with superimposed fitted normal density.
jbtest                         - Jarque-Bera hypothesis test of composite normality.
lhsnorm                        - Generate a latin hypercube sample with a normal distribution
logncdf                        - Lognormal cumulative distribution function (cdf).
lognfit                        - Parameter estimates and confidence intervals for lognormal data.
logninv                        - Inverse of the lognormal cumulative distribution function (cdf).
lognlike                       - Negative log-likelihood for the lognormal distribution.
lognpdf                        - Lognormal probability density function (pdf).
lognrnd                        - Random arrays from the lognormal distribution.
lognstat                       - Mean and variance for the lognormal distribution.
mvncdf                         - Multivariate normal cumulative distribution function (cdf).
mvnpdf                         - Multivariate normal probability density function (pdf).
mvnrnd                         - Random vectors from the multivariate normal distribution.
normcdf                        - Normal cumulative distribution function (cdf).
normfit                        - Parameter estimates and confidence intervals for normal data.
norminv                        - Inverse of the normal cumulative distribution function (cdf).
normlike                       - Negative log-likelihood for the normal distribution.
normpdf                        - Normal probability density function (pdf).
normplot                       - Displays a normal probability plot.
normrnd                        - Random arrays from the normal distribution.
normspec                       - Plots normal density between specification limits.
normstat                       - Mean and variance for the normal distribution.
logn3fit                       - Fit a 3-param lognormal dist'n using cumulative probabilities.
wgtnormfit                     - Fitting example for a weighted normal distribution.
wgtnormfit2                    - Fitting example for a weighted normal distribution (log(sigma) parameterization).
>>

Judicious search terms help but seeing the list of things related to "normal" lets you find the two functions of interest (plus a lot more depending upon which toolboxes are available, maybe) that might be of use/interest...

Best Answer

Related Solutions

MATLAB: Kstest and hypothesis rejecting

MATLAB: How to go about finding the standard normal probability based on the z-score

Related Question