Solved – Testing if data comes from a normal distribution with mean 0 and unknown variance in Matlab

hypothesis testingMATLABnormal distribution

Question

I have a vector of data, and I want to test if it came from a normal distribution with mean zero and unknown variance. Do you know if there is matlab function or simple script for this? If you don't know anything matlab specific, then a name and reference for the specific test is fine and I will just implement it myself.

Also, if the specific test can return the confidence level instead of just answering yes-no at a given confidence level then that would be a benefit, but is not essential.

What I already know

If I want to test if my data is from a normal distribution with mean 0 and variance 1 then I can use the Kolmogorov-Smirnov test. If I want if my data is from a normal distribution with unknown mean AND variance then I can use the Lilliefors test or the Jarque-Bera test. However, I want a fixed mean (= 0) and unknown variance.

Naive approach

The naive approach is to take my data $D$, calculate the variance around zero $\sigma^2_0$ and then renormalize my data by this to get a dataset $D'$. Then I can perform the Kolmogorov-Smirnov test on this. However, it is not clear how one would justify this, especially since the KS tests specifically warns against testing against distributions with parameters estimated from the same data (renormalizing $D$ to $D'$ will be the same as testing against a normal distibution with mean zero and variance $\sigma^2_0$). Is this naive approach justified?

Best Answer

You can use Spiegelhalter's test (1983, not the 'omnibus test' from 1977):

function pval = spiegel_test(x)
% compute pvalue under null of x normally distributed;
% x should be a vector;
% D. J. Spiegelhalter, 'Diagnostic tests of distributional shape,' 
% Biometrika, 1983
xm = mean(x);
xs = std(x);
xz = (x - xm) ./ xs;
xz2 = xz.^2;
N = sum(xz2 .* log(xz2));
n = numel(x);
ts = (N - 0.73 * n) / (0.8969 * sqrt(n)); %under the null, ts ~ N(0,1)
pval = 1 - abs(erf(ts / sqrt(2)));    %2-sided test. if only Matlab had R's pnorm function ...

I include code to test this under the null and under a few alternatives:

% under H0:
pvals = nan(10000,1);
for tt=1:numel(pvals);
    pvals(tt) = spiegel_test(randn(300,1));
end
mean(pvals < 0.05)

I get something like:

ans =

    0.0512

Under some alternatives:

%under Ha (using a Tukey g-distribution)
g = 0.4;
pvals = nan(10000,1);
for tt=1:numel(pvals);
    pvals(tt) = spiegel_test((exp(g * randn(300,1)) - 1)/g);
end
mean(pvals < 0.05)

%under Ha (using a Tukey h-distribution)
h = 0.1;
pvals = nan(10000,1);
for tt=1:numel(pvals);
    x = randn(300,1);
    pvals(tt) = spiegel_test(x .* exp(0.5 * h * x.^2));
end
mean(pvals < 0.05)

I get:

ans =

    0.8494


ans =

    0.8959

This test discards the knowledge that the mean must equal zero, so is perhaps less powerful than other tests. Spiegelhalter notes this test performs reasonably well for sample sizes greater than about 25, and is designed to test against symmetric alternatives (e.g. the Tukey h-distribution). It is less powerful against asymmetric alternatives.

Related Solutions

Solved – How to test the data against an specific normal distribution

ks.test in R allows one to adjust the mean and sd of the distribution to be tested against. e.g.

x <- rnorm(1000, 4, 10)
ks.test(x, "pnorm", mean = 4, sd = 10)

Solved – Estimating mean of Normal with unknown variance and then predict the future observation

Quoting from our Bayesian Essentials with R book,

if $\mathscr{D}_n$ denotes a normal $\mathscr{N}\left(\mu,\sigma^{2}\right)$ sample of size $n$, if $\mu$ has a prior equal to a $\mathscr{N}\left(0,\sigma^{2}\right)$ distribution, and $\sigma^{-2}$ an exponential $\mathscr{E}(1)$ distribution, the posterior is given by \begin{align*} \pi((\mu,\sigma^2)|\mathscr{D}_n) &\propto \pi(\sigma^2)\times\pi(\mu|\sigma^2)\times f(\mathscr{D}_n|\mu,\sigma^2)\\ & \propto (\sigma^{-2})^{1/2+2}\, \exp\left\{-(\mu^2 + 2)/2\sigma^2\right\}\\ & \times (\sigma^{-2})^{n/2}\,\exp \left\{-\left(n(\mu-\overline{x})^2 + s^2 \right)/2\sigma^2\right\} \\ &\propto (\sigma^2)^{-(n+5)/2}\exp\left\{-\left[(n+1) (\mu-n\bar x/(n+1))^2+(2+s^2)\right]/2\sigma^2\right\}\\ &\propto (\sigma^2)^{-1/2}\exp\left\{-(n+1)[\mu-n\bar x/(n+1)]^2/2\sigma^2\right\}\,.\\ &\times (\sigma^2)^{-(n+2)/2-1}\exp\left\{-(2+s^2)/2\sigma^2\right\}\,. \end{align*} Therefore, the posterior on $\theta$ can be decomposed as the product of an inverse gamma distribution on $\sigma^2$, $$\mathscr{IG}((n+2)/2,[2+s^2]/2)$$ which is the distribution of the inverse of a gamma $$\mathscr{G}((n+2)/2,[2+s^2]/2)$$ random variable and, conditionally on $\sigma^2$, a normal distribution on $\mu$, $$\mathscr{N} (n\bar x/(n+1),\sigma^2/(n+1)).$$ The marginal posterior in $\mu$ is then a Student's $t$ distribution $$ \mu|\mathscr{D}_n \sim \mathscr{T}\left(n+2,n\bar x\big/(n+1),(2+s^2)\big/(n+1)(n+2)\right)\,, $$ with $n+2$ degrees of freedom, a location parameter proportional to $\bar x$ and a scale parameter almost proportional to $s$.

From this distribution, you get the expectation $n\bar x/(n+1)$ that acts as your point estimator of $\mu$. And a credible interval on $\mu$ $$\left(n\bar x/(n+1)-((2+s^2)/(n+1)(n+2))^{1/2}q_{n+2}(\alpha),n\bar x/(n+1)+((2+s^2)/(n+1)(n+2))^{1/2}q_{n+2}(\alpha)\right)$$where $q_{n+2}(\alpha)$ is the $t_{n+1}$ quantile.