Hypothesis Testing – How to Formulate Null Hypothesis for One-Sided Tests

hypothesis testing

With a one sided test, we might want to assess if a sample mean is greater than some theoretical mean (or the other way round):

$H_A: \mu_S > \mu_T $

What confuses me is that even for one-sided test the Null-hypothesis is described as equality between the means, i.e.: $H_0: \mu_S = \mu_T $ . Why is that? Would the opposite of $H_A$ in this case not be that the our sample mean is either equal OR smaller than the theoretical value, i.e. $H_0: \mu_T \geq \mu_S $?

Best Answer

This is a common approach in some introductory statistics textbooks. The alternative hypothesis can be directional (e.g., $H_a : \mu > \mu_0$) or non-directional (e.g., $H_a : \mu \ne \mu_0$), but the null hypothesis is always written as an equality (e.g., $H_0 : \mu = \mu_0$).

Your evaluation is correct: this would be the mutually exclusive alternative only for the non-directional test. The appropriate mutually exclusive option for the first alternative hypothesis above would properly be $H_0 : \mu \le \mu_0$.

So, ¿why do textbook authors sometimes just always write the null with the equality sign? Well, it comes down to what you can (and cannot) draw. I can draw a picture of a hypothetical world where the population mean is a given value (say $\mu_0$). I can sketch the normal curve, indicate the center is at $\mu_0$, and I'm good to go. What I can't do is draw infinitely many other such curves were $\mu \le \mu_0$.

OK...but ¿won't the $P$-values be different if I drew different curves? Yes, they would, but if you conduct a thought-experiment of what the new $P$-value would be if you did have a normal curve with a shifted mean, that new $P$-value will always be less than the one you calculated with the fixed null hypothesis.

And in the end, technically, I can't calculate a separate $P$-value for the infinite options indicated in $H_0: \mu \le \mu_0$, but I can calculate one for $H_0 : \mu = \mu_0$. (Well, not if we aren't going down a Bayesian path...)^**

Hope this helps justify the pedagogic rationale behind this (seemingly) wrong conventional notation.

Footnotes/Comments
^**This comment is based on the more simplistic definition of $P$-value used in most introductory statistics textbook. A more general definition of the $P$-value can account for this, and is described in another answer below.

Related Solutions

Solved – One sided hypothesis testing with two-sided interval

The thing which is confusing you is where to put the lower bound. In the two-sided 90% interval you give the lower bound is finite (4283.588). If you want a 95% one-sided interval it has the same upper bound (as you say) but the upper bound is infinite. So it is from $-\infty$ to 4616.412.

Hypothesis Testing – How to Compare One-Sided and Two-Sided Hypothesis Testing with P-Value

Suppose a previous process for making a particular kind of steel wire yielded wire with breaking strength $\mathsf{Norm}(\mu=50,\sigma=5).$ A new process is now in use and we would like to know if the breaking strength has changed. If different, we have no basis for guessing whether it is higher or lower.

Now $n = 42$ test specimens of the new wire are available and their breaking strengths, recorded in vector x have been determined. A change of $2$ or more would be a practical importance.

We wish to use a two-sided, one-sample t test, at the 5% level, of $H_0: \mu=50$ against the alternative $H_a: \mu \ne 50.$ In R, the relevant test gives the following output. The result of this two-sided test is not significant at the 5% level.

t.test(x, mu=50)

        One Sample t-test

data:  x
t = 1.9969, df = 41, p-value = 0.0525
alternative hypothesis: 
 true mean is not equal to 50
95 percent confidence interval:
 49.97994 53.56558
sample estimates:
mean of x 
  51.77276

Before the specimens from the new process were measured for breaking strength, we used the standard deviation $\sigma=5$ and the important difference $\Delta = 2$ to see how many specimens should be used for the test. We determined that $n=45$ specimens would suffice to give power (probability of detecting a real difference of size $\Delta=2)$ about 75%. So the test was not 'sure' to give a significant result even if there is a real difference. To make matters a little worse, we got only $n=42$ specimens.

set.seed(1005)
pv = replicate(10^5, t.test(rnorm(45, 52, 5), mu=50)$p.val)
mean(pv <= 0.05)
[1] 0.74662

Now suppose someone notices that the sample mean $\bar X = 51.77$ is larger than $\mu_0 = 50$ and suggests that we could get a P-value smaller than the magical 5% level by doing a one-sided test, as shown below. The P-value of the right-sided test is half the P-value of the two-sided test.

t.test(x, mu=50, alt="greater")

        One Sample t-test

data:  x
t = 1.9969, df = 41, p-value = 0.02625
alternative hypothesis: 
 true mean is greater than 50
95 percent confidence interval:
 50.27881      Inf
sample estimates:
mean of x 
 51.77276

There are several things wrong with using this one-sided test to declare that the new process differs significantly from the old one. Here are a few.

We set out to test for a change in either direction. Now a second analysis of the same data has 'declared' an increase with significance barely below the 5% level. This is "P-hacking," which can lead to "false discovery."
The 95% confidence interval for $\mu$ from the two-sided test is $(49.98,\, 53.57),$ which includes the hypothetical value 50 (if only just barely).
The actual difference between $\mu=50$ and $\bar X = 51.77$ is less than the 2 units we said is of practical importance.
We had planned a somewhat skimpy sample size of 45 in our power computation and finally had only 42 available. Maybe the new process is different than the old, and maybe not. We don't have enough data to say it is.

Note: The fictitious data used above was sampled in R as shown below. Of course, in a real-life application the exact population parameters would never be known.

set.seed(2021)
x = rnorm(42, 52, 5)

Best Answer

Related Solutions

Solved – One sided hypothesis testing with two-sided interval

Hypothesis Testing – How to Compare One-Sided and Two-Sided Hypothesis Testing with P-Value

Related Question