Solved – Test for median difference

hypothesis testingmedianstatistical significance

Given samples of two distributions I am looking for a test for median difference (I.e. reject null in favor of evidence that medians are different.) I do not want to assume anything about both distributions. Is there any standard test for this situation?

I know Mood's median test, but I believe it assumes that the distributions are shifted. $F_2(t) = F_1(t-a)$ for some $a \in \mathbb{R}$. I back this claim with these sources:

Buthmann, A. (2017). "Understanding the Uses for Mood’s Median Test". I Six Sigma blog post.

Taylor, A. D. (2012). "Mood’s Median Test". Handout (PDF link via Wayback Machine). 2 pages.

Glen, S. (2016). "Mood’s Median Test: Definition, Run the Test and Interpret Results". Statistics How To: Elementary statistics for the rest of us blog post.

Best Answer

You could consider a permutation test.

median.test <- function(x,y, NREPS=1e4) {
  z <- c(x,y)
  i <- rep.int(0:1, c(length(x), length(y)))
  v <- diff(tapply(z,i,median))
  v.rep <- replicate(NREPS, {
    diff(tapply(z,sample(i),median))
  })
  v.rep <- c(v, v.rep)
  pmin(mean(v < v.rep), mean(v>v.rep))*2
}

set.seed(123)
n1 <- 100
n2 <- 200
## the two samples
x <- rnorm(n1, mean=1)
y <- rexp(n2, rate=1)
median.test(x,y)

Gives a 2 sided p-value of 0.1112 which is a testament to how inefficient a median test can be when we don't appeal to any distributional tendency.

If we used MLE, the 95% CI for the median for the normal can just be taken from the mean since the mean is the median in a normal distribution, so that's 1.00 to 1.18. The 95% CI for the median for the exponential can be framed as $\log(2)/\bar{X}$, which by the delta method is 0.63 to 0.80. Therefore the Wald test is statistically significant at the 0.05 level but the median test is not.

Synopsis

The count of data exceeding $3.5$ has a Binomial distribution with unknown probability $p$. Use this to conduct a Binomial test of $p=1/2$ against the alternative $p\ne 1/2$.

The rest of this post explains the underlying model and shows how to perform the calculations. It provides working R code to carry them out. An extended account of the underlying hypothesis testing theory is provided in my answer to "What is the meaning of p-values and t-values in statistical tests?".

The statistical model

Assuming the values are reasonably diverse (with few ties at $3.5$), then under your null hypothesis, any randomly sampled value has a $1/2=50\%$ chance of exceeding $3.5$ (since $3.5$ is characterized as the middle value of the population). Assuming all $250$ values were randomly and independently sampled, the number of them exceeding $3.5$ will therefore have a Binomial$(250,1/2)$ distribution. Let us call this number the "count," $k$.

On the other hand, if the population median differs from $3.5$, the chance of a randomly sampled value exceeding $3.5$ will differ from $1/2$. This is the alternative hypothesis.

Finding a suitable test

The best way to distinguish the null situation from its alternatives is to look at the values of $k$ that are most likely under the null and less likely under the alternatives. These are the values near $1/2$ of $250$, equal to $125$. Thus, a critical region for your test consists of values relatively far from $125$: close to $0$ or close to $250$. But how far from $125$ must they be to constitute significant evidence that $3.5$ is not the population median?

In depends on your standard of significance: this is called the test size, often termed $\alpha$. Under the null hypothesis, there should be close to--but not more than--an $\alpha$ chance that $k$ will be in the critical region.

Ordinarily, when we have no preconceptions about which alternative will apply--a median greater or less than $3.5$--we try to construct the critical region so that there is half of that chance, $\alpha/2$, that $k$ is low and the other half, $\alpha/2$, that $k$ is high. Because we know the distribution of $k$ under the null hypothesis, this information is enough to determine the critical region.

Technically, there are two common ways to carry out the calculation: compute the Binomial probabilities or approximate them with a Normal distribution.

Calculation with binomial probabilities

Use the percentage point (quantile) function. In R, for instance, this is called qbinom and would be invoked like

alpha <- 0.05 # Test size
c(qbinom(alpha/2, 250, 1/2)-1, qbinom(1-alpha/2, 250, 1/2)+1)

The output for $\alpha=0.05$ is

109 141

It means that the critical region comprises all the low values of $k$ between (and including) $0$ and $109$, together with all the high values of $k$ between (and including) $141$ and $250$. As a check, we can ask R to calculate the chance that k lies in that region when the null is true:

pbinom(109, 250, 1/2) + (1-pbinom(141-1, 250, 1/2))

The output is $0.0497$, very close to--but not greater than--$\alpha$ itself. Because the critical region must end at a whole number, it is not usually possible to make this actual test size exactly equal to the nominal test size $\alpha$, but in this case the two values are very close indeed.

Calculation with the normal approximation

The mean of a Binomial$(250, 1/2)$ distribution is $250\times 1/2=125$ and its variance is $250\times 1/2\times (1-1/2) = 250/4$, making its standard deviation equal to $\sqrt{250/4}\approx 7.9$. We will replace the Binomial distribution with a Normal distribution. The standard Normal distribution has $\alpha/2=0.05/2$ of its probability less than $-1.95996$, as computed by the R command

qnorm(alpha/2)

Because Normal distributions are symmetric, it also has $0.05/2$ of its probability greater than $+1.95996$. Therefore the critical region consists of values of $k$ that are more than $1.95996$ standard deviations away from $125$. Compute these thresholds: they equal $125 \pm 7.9\times 1.96 \approx 109.5, 140.5$. The calculation can be carried out in one swoop as

250*1/2 + sqrt(250*1/2*(1-1/2)) * qnorm(alpha/2) * c(1,-1)

Since $k$ has to be a whole number, we see it will fall into the critical region when it is $109$ or less or $141$ or greater. This answer is identical to the one obtained using the exact Binomial calculation. This typically is the case when $p$ is nearer $1/2$ than it is to $0$ or $1$, the sample size is moderate to large (tens or more), and $\alpha$ is not very small (a few percent).

This test, because it assumes nothing about the population (except that it doesn't have a lot of probability focused right on its median), is not as powerful as other tests that make specific assumptions about the population. If the test nevertheless rejects the null, there's no need to be concerned about lack of power. Otherwise, you have to make some delicate trade-offs between what you are willing to assume and what you are able to conclude about the population.

Best Answer

Related Solutions

Solved – One sample median test: Wilcoxon, sign test or chi squared

Solved – How to test the median of a population

Synopsis

The statistical model

Finding a suitable test

Calculation with binomial probabilities

Calculation with the normal approximation

Related Question