Solved – Testing if data comes from a normal distribution with mean 0 and unknown variance in Matlab

hypothesis testingMATLABnormal distribution

Question

I have a vector of data, and I want to test if it came from a normal distribution with mean zero and unknown variance. Do you know if there is matlab function or simple script for this? If you don't know anything matlab specific, then a name and reference for the specific test is fine and I will just implement it myself.

Also, if the specific test can return the confidence level instead of just answering yes-no at a given confidence level then that would be a benefit, but is not essential.


What I already know

If I want to test if my data is from a normal distribution with mean 0 and variance 1 then I can use the Kolmogorov-Smirnov test. If I want if my data is from a normal distribution with unknown mean AND variance then I can use the Lilliefors test or the Jarque-Bera test. However, I want a fixed mean (= 0) and unknown variance.

Naive approach

The naive approach is to take my data $D$, calculate the variance around zero $\sigma^2_0$ and then renormalize my data by this to get a dataset $D'$. Then I can perform the Kolmogorov-Smirnov test on this. However, it is not clear how one would justify this, especially since the KS tests specifically warns against testing against distributions with parameters estimated from the same data (renormalizing $D$ to $D'$ will be the same as testing against a normal distibution with mean zero and variance $\sigma^2_0$). Is this naive approach justified?

Best Answer

You can use Spiegelhalter's test (1983, not the 'omnibus test' from 1977):

function pval = spiegel_test(x)
% compute pvalue under null of x normally distributed;
% x should be a vector;
% D. J. Spiegelhalter, 'Diagnostic tests of distributional shape,' 
% Biometrika, 1983
xm = mean(x);
xs = std(x);
xz = (x - xm) ./ xs;
xz2 = xz.^2;
N = sum(xz2 .* log(xz2));
n = numel(x);
ts = (N - 0.73 * n) / (0.8969 * sqrt(n)); %under the null, ts ~ N(0,1)
pval = 1 - abs(erf(ts / sqrt(2)));    %2-sided test. if only Matlab had R's pnorm function ... 

I include code to test this under the null and under a few alternatives:

% under H0:
pvals = nan(10000,1);
for tt=1:numel(pvals);
    pvals(tt) = spiegel_test(randn(300,1));
end
mean(pvals < 0.05)

I get something like:

ans =

    0.0512

Under some alternatives:

%under Ha (using a Tukey g-distribution)
g = 0.4;
pvals = nan(10000,1);
for tt=1:numel(pvals);
    pvals(tt) = spiegel_test((exp(g * randn(300,1)) - 1)/g);
end
mean(pvals < 0.05)

%under Ha (using a Tukey h-distribution)
h = 0.1;
pvals = nan(10000,1);
for tt=1:numel(pvals);
    x = randn(300,1);
    pvals(tt) = spiegel_test(x .* exp(0.5 * h * x.^2));
end
mean(pvals < 0.05)

I get:

ans =

    0.8494


ans =

    0.8959

This test discards the knowledge that the mean must equal zero, so is perhaps less powerful than other tests. Spiegelhalter notes this test performs reasonably well for sample sizes greater than about 25, and is designed to test against symmetric alternatives (e.g. the Tukey h-distribution). It is less powerful against asymmetric alternatives.

Related Question