Hypothesis tests for Rayleigh variables

heteroscedasticityhypothesis testingrayleigh distributionstatistical significance

Given samples from two Rayleigh-distributed random variables with unknown parameters, $X \sim R(\sigma_x), Y \sim R(\sigma_y)$, what tests can we use to determine if and to what extent their parameters ($\sigma_x, \sigma_y$) are different?

For Rayleigh variables we have a closed-form unbiased MLE for the parameter $\sigma$, as well as for confidence intervals on the estimate. We also have the convenient fact that the Rayleigh parameter squared is a variance, so presumably we could employ general tests for homoscedasticity.

Are there any tests or techniques specific to the Rayleigh distribution?

Best Answer

"Determine to what extent they're different" seems to be more suggesting a problem in estimation rather than testing. I will proceed to discuss tests, but this consideration may suggest you want to do something else.

I'll presume you want independence both within and across groups.

Since $X^2$ and $Y^2$ are Exponential, the likelihood-ratio test for the exponential should yield an equivalent test to the LR test for the Rayleigh. We should then (with a little algebraic manipulation) end up with something equivalent to a test based on the ratio of the means of the squared-observations.

For a one-tailed test an $F$-test would then be obtained pretty directly from the ratio of the likelihoods (that is, a test based on $F=\frac{\overline{x^2}}{\overline{y^2}}$) with $2n_x$ and $2n_y$ df should be equivalent to a likelihood ratio test). For the two tailed case, the "best" test based on the likelihood ratio ($\Lambda$) won't be equal-tailed though for $n$ large it won't matter.

The equal-tail ($α/2$) version of a two-sided test is (in general) not strictly an LRT, (not rejecting the set of smallest values of the likelihood ratio given $α$), since in general one tail will have values of the LR that are smaller than the smallest value in the other tail; you could typically get better power if you actually do the LRT (albeit that may be slightly more complicated to do). See the plot below.

However, a small caveat; from a scientific standpoint you might sometimes prefer to have an equal tail test in spite of this potential loss of power. A critic might say that you're "favouring" one direction of the alternative over the other because it's easier to reject there.

Note that $Λ=\mathcal{L}_0/\mathcal{L}_1$ is the likelihood ratio calculated here under the Rayleigh assumption. You reject when the likelihood under $H_0$ is small relative to the likelihood under $H_1$ – i.e. for small values of the likelihood ratio.

In the case where both $n$'s are large enough that we don't care about giving up a smidge of power for some added convenience, a simple "symmetric" (equal tail) F test would result. While you can do better in terms of power, it is exact and still pretty powerful.

See some discussion of that choice of test here. Note that the df are double the sample sizes. See also Scortchi's answer there for additional enlightening information.

Here's a plot that illustrates the distinction. It is based on 100000 pairs of samples from an exponential(1) (i.e. the squares of your Rayleigh values). The simulation is conducted under $H_0$ here and the specific value of the parameter is immaterial (it cancels). Here the x-sample values are a sample of size 8 and the y-sample values are a sample of size 12. The red values indicate the lower 2.5% and upper 2.5% of an F test (the rejection region for F). The blue dashed horizontal line is the 5% critical value for the likelihood ratio test (estimated here by simply taking the sample quantile from the simulation, though it shouldn't be that hard to actually compute it for this case).

Plot of likelihood ratio vs F statistic for 100000 samples

R code used to generate the plot is appended below. I generated a new plot because the old one (by chance it seems) showed an unusually large difference between the tests; typically they're fairly close as with the current plot.

Edit: extra detail where the action is, log-log scale:

Expanded view of section where the two approaches differ to show the difference in likelihood of the 2.5% cutoffs more clearly

The likelihood ratio test rejects some cases in the left tail of the F that the F test does not (i.e. it would place more than 2.5% of the null distribution in the rejection region corresponding the left tail of the ratio of means), and it fails to reject some cases that the F test would reject in the right tail (placing less than 2.5% of the null distribution in the rejection region corresponding the right tail of the ratio of means).

I think the two tests will always correspond when the sample sizes are equal.

However you could also do a permutation test on $Λ$ (i.e. directly on the likelihood ratio), and so still have an exact test even if the Rayleigh assumption wasn't quite right.

A permutation test is straightforward in two-sample test since under $H_0$ observations are directly exchangeable and gives you an exact test in small samples whether or not it's actually Rayleigh but at the same time it should be asymptotically efficient if it is Rayleigh. Doing a permutation version of a likelihood ratio test is not something you see often – to my surprise given it seems to offer a convenient way to have your cake (good power when your distributional model is right) and eat it too (correct significance level even if it's wrong).

If you're not so worried about small-sample exactness, you can use the chi-squared asymptotic null distribution for $-2\log\Lambda$ from likelihood ratio tests, or when working on a computer, you can just fit either a gamma GLM to the square (with appropriate choice of dispersion parameter in the test), or use parametric survival regression and fit a Weibull (again, with one parameter "known").

If you're not at all confident of the Rayleigh (so it might not be "close"), maybe consider a Wilcoxon-Mann-Whitney. If you're nevertheless interested in an interval for the change in the ratio of scales, there's always the possibility of taking logs and getting an interval the shift of the log (which good WMW implementations will give you) and back out interval for the ratio from that.

There's many other things that could be done, but I think these points cover the main things I'd worry about in practice.


Summary

If you're confident it's really Rayleigh, and the sample sizes are very similar or at least not small, use the ratios of the means of the squared data in an F test ($F=\frac{\overline{x^2}}{\overline{y^2}}$) with $2n_x$ and $2n_y$ df.

Note also that there's additional details on this in an answer to this followup question: What is a likelihood ratio test for a specific distribution, and how does it related to hypothesis tests?


Quick bit of R code for the plot:

n1=8; n2=12
nsim=100000

res=replicate(nsim, {
  x=rexp(n1); y=rexp(n2)
  mx=mean(x); my=mean(y)
  m0=mean(c(x,y))
  # Log likelihood of these samples under null hypothesis of equal means:
  logL0 = sum(dexp(c(x,y), 1/m0, log=TRUE))
  # Log-likelihood of these samples under the alternative hypothesis of different means:
  logL1 = sum(dexp(x, 1/mx, log=TRUE)) + sum(dexp(y, 1/my, log=TRUE))
  # Log-likelihood ratio and F-statistic for this simulation:
  c(logLam=logL0-logL1, F=mx/my)
 })

between = function(x,v) ((x>v[1])&(x<v[2]))
FC = qf(c(.025,.975), n1*2, n2*2)  # F-test critical values
plot(res[2,],exp(res[1,]),col=2-between(res[2,],FC),xlab="F",ylab="LR",pch=16,cex=.1)
q=quantile(exp(res[1,]),p=.05)  # estimate 5% point of LR under H0 
abline(h=q,col=4,lty=2)

feetwet (the OP) suggested (via an edit) adding the following Python code. I have accepted the edit, but I HAVE NOT CHECKED THIS CODE, so please keep that in mind.

Same in Python:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import expon, f
n1 = 8
n2 = 12
nsim = 100_000
L = np.zeros(nsim)  # Likelihood ratio
F = np.zeros(nsim)  # F-statistic (ratio of sample means)
for i in range(nsim):
    x = np.random.exponential(size=n1)
    y = np.random.exponential(size=n2)
    mx = np.mean(x)
    my = np.mean(y)
    m0 = np.mean(np.concatenate((x, y)))
    # Likelihood of these samples under null hypothesis of equal means:
    L0 = np.prod(expon.pdf(np.concatenate((x, y)), scale=m0))
    # Likelihood of these samples under the alternative hypothesis of different means:
    L1 = np.prod(expon.pdf(x, scale=mx)) * np.prod(expon.pdf(y, scale=my))
    L[i] = L0 / L1  # Likelihood ratio for this simulation
    F[i] = mx / my  # F statistic for this simulation:
FC = f.ppf([0.025, 0.975], n1*2, n2*2)  # 95% confidence region for F-test
plt.scatter(F, L, c=['k' if FC[0] < x < FC[1] else 'r' for x in F], s=0.1)
plt.xlabel('F')
plt.ylabel('LR')
plt.axhline(y=np.quantile(L, q=0.05), color='b', linestyle='--')  # 5% critical value for LRT
plt.show()
Related Question