Solved – One sample median test: Wilcoxon, sign test or chi squared

hypothesis testingmedian

I would like to understand the differences between these tests for one sample median test and when to use each of them.

I have been searching information and this is what I got so far:

Sign test (in R sign.test) does not need a probability distribution symmetric but Wilcoxon (wilcox.test) does.
Wilcoxon is more powerful.

To check the symmetry in the prob distribution, I through an histogram right?
What are the assumptions for each of them? When should I use them?

I am pretty lost here so any help is welcome.

Best Answer

One-sample sign test tests that the median in the population equals the value.
One-sample Wilcoxon test tests that the distribution in the population is symmetric around the value. More technically, that the sum of two randomly chosen deviations from the value has equal probability to occure positive or negative. Note that rejecting this null hypothesis does not preclude the value to be the mean or median of the population. The rejection implies two possible reasons: either the distribution is symmetric about some other value or the distribution is not symmetric at all.
So if we do assume symmetric shape of the population distribution then the Wilcoxon tests that the mean (=median) in the population equals the value (it is this test then is the nonparametric alternative to one-sample t-test which assumes normality). If you assume the symmetry and hence you test for mean (=median), then Wilcoxon is more powerful, as a median test, then the more universal sign test above.

Synopsis

The count of data exceeding $3.5$ has a Binomial distribution with unknown probability $p$. Use this to conduct a Binomial test of $p=1/2$ against the alternative $p\ne 1/2$.

The rest of this post explains the underlying model and shows how to perform the calculations. It provides working R code to carry them out. An extended account of the underlying hypothesis testing theory is provided in my answer to "What is the meaning of p-values and t-values in statistical tests?".

The statistical model

Assuming the values are reasonably diverse (with few ties at $3.5$), then under your null hypothesis, any randomly sampled value has a $1/2=50\%$ chance of exceeding $3.5$ (since $3.5$ is characterized as the middle value of the population). Assuming all $250$ values were randomly and independently sampled, the number of them exceeding $3.5$ will therefore have a Binomial$(250,1/2)$ distribution. Let us call this number the "count," $k$.

On the other hand, if the population median differs from $3.5$, the chance of a randomly sampled value exceeding $3.5$ will differ from $1/2$. This is the alternative hypothesis.

Finding a suitable test

The best way to distinguish the null situation from its alternatives is to look at the values of $k$ that are most likely under the null and less likely under the alternatives. These are the values near $1/2$ of $250$, equal to $125$. Thus, a critical region for your test consists of values relatively far from $125$: close to $0$ or close to $250$. But how far from $125$ must they be to constitute significant evidence that $3.5$ is not the population median?

In depends on your standard of significance: this is called the test size, often termed $\alpha$. Under the null hypothesis, there should be close to--but not more than--an $\alpha$ chance that $k$ will be in the critical region.

Ordinarily, when we have no preconceptions about which alternative will apply--a median greater or less than $3.5$--we try to construct the critical region so that there is half of that chance, $\alpha/2$, that $k$ is low and the other half, $\alpha/2$, that $k$ is high. Because we know the distribution of $k$ under the null hypothesis, this information is enough to determine the critical region.

Technically, there are two common ways to carry out the calculation: compute the Binomial probabilities or approximate them with a Normal distribution.

Calculation with binomial probabilities

Use the percentage point (quantile) function. In R, for instance, this is called qbinom and would be invoked like

alpha <- 0.05 # Test size
c(qbinom(alpha/2, 250, 1/2)-1, qbinom(1-alpha/2, 250, 1/2)+1)

The output for $\alpha=0.05$ is

109 141

It means that the critical region comprises all the low values of $k$ between (and including) $0$ and $109$, together with all the high values of $k$ between (and including) $141$ and $250$. As a check, we can ask R to calculate the chance that k lies in that region when the null is true:

pbinom(109, 250, 1/2) + (1-pbinom(141-1, 250, 1/2))

The output is $0.0497$, very close to--but not greater than--$\alpha$ itself. Because the critical region must end at a whole number, it is not usually possible to make this actual test size exactly equal to the nominal test size $\alpha$, but in this case the two values are very close indeed.

Calculation with the normal approximation

The mean of a Binomial$(250, 1/2)$ distribution is $250\times 1/2=125$ and its variance is $250\times 1/2\times (1-1/2) = 250/4$, making its standard deviation equal to $\sqrt{250/4}\approx 7.9$. We will replace the Binomial distribution with a Normal distribution. The standard Normal distribution has $\alpha/2=0.05/2$ of its probability less than $-1.95996$, as computed by the R command

qnorm(alpha/2)

Because Normal distributions are symmetric, it also has $0.05/2$ of its probability greater than $+1.95996$. Therefore the critical region consists of values of $k$ that are more than $1.95996$ standard deviations away from $125$. Compute these thresholds: they equal $125 \pm 7.9\times 1.96 \approx 109.5, 140.5$. The calculation can be carried out in one swoop as

250*1/2 + sqrt(250*1/2*(1-1/2)) * qnorm(alpha/2) * c(1,-1)

Since $k$ has to be a whole number, we see it will fall into the critical region when it is $109$ or less or $141$ or greater. This answer is identical to the one obtained using the exact Binomial calculation. This typically is the case when $p$ is nearer $1/2$ than it is to $0$ or $1$, the sample size is moderate to large (tens or more), and $\alpha$ is not very small (a few percent).

This test, because it assumes nothing about the population (except that it doesn't have a lot of probability focused right on its median), is not as powerful as other tests that make specific assumptions about the population. If the test nevertheless rejects the null, there's no need to be concerned about lack of power. Otherwise, you have to make some delicate trade-offs between what you are willing to assume and what you are able to conclude about the population.

Best Answer

Related Solutions

Hypothesis Testing – Appropriateness of Wilcoxon Signed Rank Test

Solved – How to test the median of a population

Synopsis

The statistical model

Finding a suitable test

Calculation with binomial probabilities

Calculation with the normal approximation

Related Question