Wikipedia has misled you in stating "...if both x and y are given and paired is TRUE, a Wilcoxon signed rank test of the null that the distribution ... of x - y (in the paired two sample case) is symmetric about mu is performed."
The test determines whether the RANK-TRANSFORMED values of $z_i = x_i - y_i$ are symmetric around the median you specify in your null hypothesis (I assume you'd use zero). Skewness is not a problem, since the signed-rank test, like most nonparametric tests, is "distribution free." The price you pay for these tests is often reduced power, but it looks like you have a large enough sample to overcome that.
A "what the hell" alternative to the rank-sum test might be to try a simple transformation like $\ln(x_i)$ and $\ln(y_i)$ on the off chance that these measurements might roughly follow a lognormal distribution--so the logged values should look "bell curvish". Then you could use a t test and convince yourself (and your boss who only took Business Stats) that the rank-sum test is working. If this works, there's a bonus: the t test on means for lognormal data is a comparison of medians for the original, untransformed, measurements.
Me? I'd do both, and anything else I could cook up (likelihood ratio test on Poisson counts by firm size?). Hypothesis testing is all about determining whether evidence is convincing, and some folks take a heap of convincin'.
Synopsis
The count of data exceeding $3.5$ has a Binomial distribution with unknown probability $p$. Use this to conduct a Binomial test of $p=1/2$ against the alternative $p\ne 1/2$.
The rest of this post explains the underlying model and shows how to perform the calculations. It provides working R
code to carry them out. An extended account of the underlying hypothesis testing theory is provided in my answer to "What is the meaning of p-values and t-values in statistical tests?".
The statistical model
Assuming the values are reasonably diverse (with few ties at $3.5$), then under your null hypothesis, any randomly sampled value has a $1/2=50\%$ chance of exceeding $3.5$ (since $3.5$ is characterized as the middle value of the population). Assuming all $250$ values were randomly and independently sampled, the number of them exceeding $3.5$ will therefore have a Binomial$(250,1/2)$ distribution. Let us call this number the "count," $k$.
On the other hand, if the population median differs from $3.5$, the chance of a randomly sampled value exceeding $3.5$ will differ from $1/2$. This is the alternative hypothesis.
Finding a suitable test
The best way to distinguish the null situation from its alternatives is to look at the values of $k$ that are most likely under the null and less likely under the alternatives. These are the values near $1/2$ of $250$, equal to $125$. Thus, a critical region for your test consists of values relatively far from $125$: close to $0$ or close to $250$. But how far from $125$ must they be to constitute significant evidence that $3.5$ is not the population median?
In depends on your standard of significance: this is called the test size, often termed $\alpha$. Under the null hypothesis, there should be close to--but not more than--an $\alpha$ chance that $k$ will be in the critical region.
Ordinarily, when we have no preconceptions about which alternative will apply--a median greater or less than $3.5$--we try to construct the critical region so that there is half of that chance, $\alpha/2$, that $k$ is low and the other half, $\alpha/2$, that $k$ is high. Because we know the distribution of $k$ under the null hypothesis, this information is enough to determine the critical region.
Technically, there are two common ways to carry out the calculation: compute the Binomial probabilities or approximate them with a Normal distribution.
Calculation with binomial probabilities
Use the percentage point (quantile) function. In R
, for instance, this is called qbinom
and would be invoked like
alpha <- 0.05 # Test size
c(qbinom(alpha/2, 250, 1/2)-1, qbinom(1-alpha/2, 250, 1/2)+1)
The output for $\alpha=0.05$ is
109 141
It means that the critical region comprises all the low values of $k$ between (and including) $0$ and $109$, together with all the high values of $k$ between (and including) $141$ and $250$. As a check, we can ask R
to calculate the chance that k
lies in that region when the null is true:
pbinom(109, 250, 1/2) + (1-pbinom(141-1, 250, 1/2))
The output is $0.0497$, very close to--but not greater than--$\alpha$ itself. Because the critical region must end at a whole number, it is not usually possible to make this actual test size exactly equal to the nominal test size $\alpha$, but in this case the two values are very close indeed.
Calculation with the normal approximation
The mean of a Binomial$(250, 1/2)$ distribution is $250\times 1/2=125$ and its variance is $250\times 1/2\times (1-1/2) = 250/4$, making its standard deviation equal to $\sqrt{250/4}\approx 7.9$. We will replace the Binomial distribution with a Normal distribution. The standard Normal distribution has $\alpha/2=0.05/2$ of its probability less than $-1.95996$, as computed by the R
command
qnorm(alpha/2)
Because Normal distributions are symmetric, it also has $0.05/2$ of its probability greater than $+1.95996$. Therefore the critical region consists of values of $k$ that are more than $1.95996$ standard deviations away from $125$. Compute these thresholds: they equal $125 \pm 7.9\times 1.96 \approx 109.5, 140.5$. The calculation can be carried out in one swoop as
250*1/2 + sqrt(250*1/2*(1-1/2)) * qnorm(alpha/2) * c(1,-1)
Since $k$ has to be a whole number, we see it will fall into the critical region when it is $109$ or less or $141$ or greater. This answer is identical to the one obtained using the exact Binomial calculation. This typically is the case when $p$ is nearer $1/2$ than it is to $0$ or $1$, the sample size is moderate to large (tens or more), and $\alpha$ is not very small (a few percent).
This test, because it assumes nothing about the population (except that it doesn't have a lot of probability focused right on its median), is not as powerful as other tests that make specific assumptions about the population. If the test nevertheless rejects the null, there's no need to be concerned about lack of power. Otherwise, you have to make some delicate trade-offs between what you are willing to assume and what you are able to conclude about the population.
Best Answer