How many cases are needed to get enough power

probabilitystatistics

The following Quizzes are the rough translation (with modification) of Quizzes No.03-1-(1) and No.03-1-(2) of the exam of the "2019's semi-first grade of Japan Statistical Society Certificate (JSSC)" (See the column named 【Quizzes and Official Answers】 and ref.1). The correct answers by the JSSC are also described under each Quiz(See the column named 【Quizzes and Official Answers】 and ref.1).

However, I don't know how to get the correct answer.

【My question】:

  • How can I reach the correct answer?
  • For both (quiz 1) and (quiz 2), It seems that the denial of the null
    hypothesis is not an alternative hypothesis. Is this OK for the null hypothesis and alternative hypothesis?

【Quizzes and Official Answers】:
We consider adverse events that occur during clinical trial and post-marketing surveillance.

  • Let $N$ be sample size of this clinical trial.
  • Let $p$ be the probability of the incidence of adverse events of the population ($0\le p \le 1$).
  • Let $\hat{p}$ be estimated incidence ratio of this adverse event in
    this clinical trial ($0\le \hat{p} \le 1$). Hereinafter, the $\hat{p}$ are referred to as the estimated incidence ratio.

We also assume that the $N$ is large enough therefore, the $\hat{p}$ on this adverse approximately follows a normal distribution.

Then, Answer the following Quiz 1 and Quiz 2.

(Quiz1)
A one-sided test is performed on this clinical trial under the following condition.

  • Let be $N= 475$ cases,
  • Let the null hypothesis ${H}_{0}$ be "$p = 0.05$" and,
  • Let the alternative hypothesis ${H}_{1}$ be "$ p> 0.05$."
    and,

We denote $P(\ \hat{p}>q |N,p)$ as the probability of the $\hat{p}>q$ under the null hypothesis.

Then, calculate the $P(\ \hat{p}>0.0733\ |N,p)$ and select best answer from the following choices.

① 0.01, ② 0.025, ③ 0.05, ④ 0.1, ⑤ 0.2
→Ans1: ① (0.01)

(Quiz2) We are planning to perform a one-sided test with a significance level of $2.5\%$ for the post-marketing surveillance under the following ${H}_{0}$, and ${H}_{1}$.

  • The null hypothesis ${H}_{0}$: $p = 0.05$" and,
  • The alternative hypothesis ${H}_{1}$: $p = 0.1$.

At this time, how many cases will be required for this post-marketing surveillance to achieve a detection power of $90\%$?
Select best answer from the following choices:

① 114, ② 164, ③ 214, ④ 264, ⑤ 314

→ Ans2: ④ (264 cases)

References:
Quiz No.3 of the exam of the "2019's semi-first grade of Japan Statistical Society Certificate" is stored in the following URL. (Written in Japanese) That is an excerpt from only the part related to this quiz. Link

P.S.
P.S. I'm not very good at English, so I'm sorry if I have some impolite or unclear expressions. I welcome any corrections and English review. (You can edit my question and description to improve them)


Post-hoc notes:
【My answer to quiz 1】 :
Rocco gave his answer to Quiz 1 on September 6th (JST). Edit:As a result of discussion with Rocco, as for Question 1, both my results and Rocco's results agreed with the official answer. But, as he says, is not consistent with the official answer.
Neither he nor I can find any mistakes in his answer so far.

I tried a different method than him:That is
Test of population ratio(Sorry, but linked document is written in English.)

We define the test-static Z as follows.

$$Z:= \frac{\hat{p}-p}{\sqrt{p(1-p)/N}}. \ \ (eq. P.H.N 1-01)$$

As the manner of Test of population ratio(Sorry but written in Japanese), the Z of (eq. P.H.N 1-01) follows Normal standard distribution.

Substitute values for p and N, p=0.05, N=475, as a result,

$$Z = \frac{\hat{p}-0.05}{\sqrt{0.05(1-0.05)/475}}=100(\hat{p}-0.05) . \ \ (eq. P.H.N 1-02)$$

Here, $\hat{p} >0.0733$ therefore,

$$Z > 100(0.0733-0.05) = 2.33 \ \ (eq. P.H.N 1-03)$$

On the other hand, by using Excel Worksheet function,
"=1-NORMDIST(2.33,0,1,TRUE)" , the result is "0.009903", therefore,

$$P (Z > 2.5) = 0.009903. \ \ (eq. P.H.N 1-04)$$

Therefore, the best answer among choices, ① is the closest.

【My answer of Quiz 2,】 : (But it doesn't much official answer.)
To clear my head to investigate Quiz 2, I made a schematic illustration (See Fig.PFN2-1).

enter image description here
Fig.PFN2-1

Figure legends of Fig.PFN2-1 is as follows.

  • The sky-blue line (0) represents the “distribution of null
    hypothesis.”
  • The orange line (1) represents the “distribution of
    alternative hypothesis.”
  • μ0 represents the mean-value of (0).
  • μ1 represents the mean-value of (1).
  • a represents Upper α point of the distribution (1)
  • The area hatched dark-blue represents α.
  • The area hatched sky-blue represents β.
  • The area hatched orange represents(including dark-blue area) (1-β).

Using the above figure, I tried the calculation as follows.

First, according to the null hypothesis, the test static ${Z}_{0} $ in (eq.P.H.N2-01) follows a standard normal distribution. Where ${p}_{0} $ is the "Probability of the incidence of adverse events of the population under the ${H}_{0}$". From the question sentence, ${p}_{0}=0.05$.

$${Z}_{0}:= \frac{\hat{p}-{p}_{0}}{\sqrt{{p}_{0}(1-{p}_{0})/N}}. \ \ (eq. P.H.N 2-01)$$

Therefore, the average ${\mu}_{0}$ and the standard deviation $ {\sigma}_{0}$ of the distribution (0) are as follows.

$${\sigma}_{0}= \sqrt{{p}_{0}(1-{p}_{0})/N} =0.218/\sqrt{N} \ \ (eq. P.H.N 2-02)$$
$${\mu}_{0}= {p}_{0}{\sigma}_{0} =0.05\cdot 0.218/\sqrt{N}=0.0109/\sqrt{N} \ \ (eq. P.H.N 2-03)$$

Now, α = 0.025, so by using the Excel worksheet function, the upper α point of the standard normal distribution, ${Z}_{\alpha}$ will be like the following formula (eq.P.H.N2-04)

${Z}_{\alpha}$ =1-NORMINV(0.025,0,1)= 2.96 (eq. P.H.N 2-04)

Therefore, the "a" in the figure is as following formula.

$$a={Z}_{\alpha}*{\sigma}_{0} + {\mu}_{0} ={Z}_{\alpha}{\sigma}_{0} + {p}_{0}{\sigma}_{0} \ \ (eq. P.H.N 2-05)$$

Next, according to the altanative hypothesis, the test static ${Z}_{1}$ in (eq.P.H.N2-05) follows a standard normal distribution. Where $ {p} _ {1} $ is the "Probability of the incidence of adverse events of the population under the ${H}_{1}$". From the question sentence, ${p}_ {1}=0.1$.

$${Z}_{1}:= \frac{\hat{p}-{p}_{1}}{\sqrt{{p}_{1}(1-{p}_{1})/N}} \ \ (eq. P.H.N 2-06)$$

Therefore, the average ${\mu}_{1}$ and the standard deviation ${\sigma}_{1}$ of the distribution (1) are as follows.

$${\sigma}_{1}= \sqrt{{p}_{1}(1-{p}_{1})/N} = 0.3/ \sqrt{N} \ \ (eq. P.H.N 2-07)$$
$${\mu}_{1}= {p}_{1}{\sigma}_{1} =0.1\cdot 0.3/ \sqrt{N} =0.03/ \sqrt{N} \ \ (eq. P.H.N 2-08)$$

Substituting $a$ into $\hat{p}$in formula.
$${Z}_{1,a} = \frac{{\sigma}_{0}({Z}_{\alpha} + {p}_{0})-{p}_{1}}{{\sigma}_{1}}
= \frac{{\sigma}_{0}}{{\sigma}_{1}}({Z}_{\alpha} + {p}_{0})-{p}_{1}/{\sigma}_{1}
\ \ (eq. P.H.N 2-09)$$

Here,
$${\sigma}_{0}/{\sigma}_{1}= 0.218\sqrt{N} / 0.300\sqrt{N} = 0.727\ \ (eq. P.H.N 2-10)$$

On the other hand, ${Z}_{1,a}$ is the upper 90% of the normal distribution from the problem statement. Therefore,

${Z}_{1,a}$=1-NORMINV(0.9,0,1)= -0.28155 (eq. P.H.N 2-13)

Therefore,
$N$ =((1-NORMINV(0.9,0,1)+2.15)/0.17)^2 = 10.99087^2 = 120.7993 (eq. P.H.N 2-14)

【My answer of Quiz 2-2】 : (It is consistent to official answer, added on 2020.04.11(JST))
Let $n$ be the number of cases, null hypothesis ($H_0$) be "p = 0.05", alternative hypothesis($H_1$) be "p=0.1".
Under those settings, we conduct a one-sided test.
Under the $H_0$ and $H_1$, the estimated incidence ratio ($\hat{p}$) approximately follows $N(0.05,0.05\times 0.95/ n)$ and $N(0.1,0.1\times 0.9/n)$ respectivly.

Under this approximation, if the power is set to 90%, the critical point for rejection is expressed by the following eq P.H.N 3-01.
$$0.1- {Z}_{0.9}\sqrt{0.1\times 0.9 /n} \tag{P.H.N 3-01}$$

Also, under that normal approximation, if the significance level is set to 2.5% under this approximation, the rejection critical point is expressed by the following eq P.H.N 3-02.
$$0.05 – {Z}_{0.975}\sqrt{0.05\times 0.95 /n} \tag{2P.H.N 3-02}$$

Here, ${Z}_{0.975}$ and ${Z}_{0.9}$ respectively represent the lower 97.5 % point and lower 90 % point of the standard normal distribution; ${Z}_{0.975}=1.96$, ${Z}_{0.9}=0.9$.

From the above, if the power is 90% and the significance level is 2.5%, the following eq3 shall be approximately satisfied.
$$0.1- {Z}_{0.9}\sqrt{0.1\times 0.9 /n}=0.05 – {Z}_{0.975}\sqrt{0.05\times 0.95 /n} \tag{P.H.N 3-03}$$

Solving the P.H.N 3-03 for n gives the following P.H.N 3-04.
$$n =
\frac{{(1.96\sqrt{0.05\times 0.95 /n}+
1.28\sqrt{0.1\times 0.9 /n})}^{2}}{{0.05^2}} = 263.2
\tag{P.H.N 3-04}$$

Best Answer

I tried solving the first one, but I get the wrong result (EDIT: NOW IT'S CORRECT). I ended up using some cumbersome route, if anyone can point out what I did wrong I would be glad. I leave it here because maybe it can be useful to OP anyways!

Let's start from the first problem.

Step 1: we write the probability $P(k|N,p)$: "probability that we observe $k$ adverse events out of $N$ total events, given that the probability for an event to be adverse is $p$". This is an elementary result (I can provide details if you want): \begin{align}P(k|N,p) = {N\choose k} p^k (1-p)^{N-k}\tag 1\end{align}

Step 2: Your problem hints at $N$ being "large enough" for $\hat p$ to be "approximately normally distributed". So let's start by finding an approximation of $P(k|N,p)$ for large $N$. We'll use Stirling's approximation: $\log N! \approx N\log N - N$ and get: \begin{align} P(k|N,p) =& {N\choose k} p^k (1-p)^{N-k} \\ \quad\\ \approx &\exp\{N\log N - k\log k - (N-k)\log (N-k) + k \log p + (N-k)\log (1-p) \} =\\ \quad\\=&\exp\left\{-N\left[\frac{k}{N}\log(\frac{k/N}{p}) + \frac{N-k}{N} \log (\frac{(N-k)/N }{1-p})\right]\right\}\\ \quad \\=& e^{-N \; D_{KL}(P_{\text{emp}}||\mathcal B(p))} \tag 2\end{align} where:

  • $P_{\text{emp}}$ is the "empirical" distribution: $\{P(\text{adverse}) = \frac{k}{N}, P(\text{non adverse}) = \frac{N-k}{N}\}$
  • $\mathcal B(p)$ is the Bernoulli distribution: $\{P(\text{adverse}) = p, P(\text{non adverse}) = 1-p\}$
  • $D_{KL}(P_{\text{emp}}||\mathcal B(p))$ is the "Kullback-Leibler" distance between those two distributions. It is defined to be the expression in $[ \;]$ brackets you see just before

Step 3: Let's now find what is our best estimator for $p$ given a set of $N$ observations comprising $k$ adverse events. We can compute the probability $P(p|k,N)$ via Bayes rule: \begin{align}P(p|k,N) = \frac{P(k|p,N)P(p|N)}{P(k|N)}\tag 3 \end{align} We are interested in finding the value $\hat p$ maximizing this probability. We'll do it by computing the derivative of this expression and setting it to zero. We don't care about the denominator ${P(k|N)}$ since it does not depend on $p$. We also forget about $P(p|N)$ since, in lack of any prior information, it is uniform in $p$ (so it acts as a multiplicative constant). The only term remaining is $P(k|p,N)$, of which we have a $N>>1$ approximation from step 2! So we have: \begin{align} P(p|k,N) \propto e^{-N \; D_{KL}(P_{\text{emp}}||\mathcal B(p))}\tag 4 \end{align} If we compute its derivative with respect to $p$ and set it to $0$ we find (I omit the computation): \begin{align}\hat p = \frac{k}{N}\tag 5 \end{align} Step 4: we further approximate $P(k|p,N)$ making it Gaussian. To do this, me must Taylor-expand this with respect to the variable $x=\frac{k}{N}$, around the point $x=p$. Terms of order $0$ and $1$ all cancel out, and we end out with (if you want, I will provide details of the computation): \begin{align} P(k|p,N) \propto e^{-\frac{\left(\frac{k}{N}-p\right)^2}{2\left(\frac{p(1-p)}{N}\right)}}\tag 6 \end{align} We can normalize this: \begin{align} P\left(\frac{k}{N} = x\; \bigg|\; p,N\right) \approx \frac{1}{\sqrt{2\pi \frac{p(1-p)}{N}}}e^{-\frac{\left(x-p\right)^2}{2\left(\frac{p(1-p)}{N}\right)}}\tag 7 \end{align}

Notice: this can be seen as an instance of the Central Limit Theorem!

Step 5: we solve the problem. Since we now have $\hat p = \frac{k}{N}$ and an expression for $P\left(\frac{k}{N} = x\; \bigg|\; p,N\right)$, we can see that the probability of $\hat p > 0.0733$, under the null hypothesis $p=0.05$ and with $N=475$, will be the integral:

\begin{align} P(\hat p > 0.0733) =& \int_{0.0733}^1 dx \frac{1}{\sqrt{2\pi \frac{0.05(1-0.05)}{475}}}e^{-\frac{\left(x-0.05\right)^2}{2\left(\frac{0.05(1-0.05)}{475}\right)}} \\&=\frac{1}{0.01\sqrt{2\pi}}\int_{0.0733}^1 dx e^{-\frac{\left(x-0.05\right)^2}{2\left(0.0001\right)}}\tag 8 \end{align}

We can perform the change of variable: $z=\frac{x-0.05}{0.01\sqrt 2}$ to get: \begin{align}P(\hat p > 0.0733)=& \int_{\frac{2.33}{\sqrt 2}}^{\frac{95}{\sqrt 2}} \frac{e^{-z^2}}{\sqrt \pi} dz= \frac{1}{2} \left[erf\left(\frac{95}{\sqrt 2}\right)-erf\left( \frac{2.33}{\sqrt 2} \right)\right] \\ \approx& \frac{1}{2}erfc\left(\frac{2.33}{\sqrt 2}\right)\approx 0.01\tag 9 \end{align}

Related Question