Hypothesis Testing – Uniformly Most Powerful Test for Weibull Distribution

hypothesis testingmathematical-statisticsself-studystatistical-powerweibull distribution

$\newcommand{\szdp}[1]{\!\left(#1\right)} \newcommand{\szdb}[1]{\!\left[#1\right]}$
Problem Statement: Let $Y_1,\dots,Y_n$ be a random sample from the probability
density function given by
$$f(y|\theta)=
\begin{cases}
\dfrac1\theta\,m\,y^{m-1}\,e^{-y^m/\theta},&y>0\\
0,&\text{elsewhere}
\end{cases}
$$
with $m$ denoting a known constant.

Find the uniformly most powerful test for testing
$H_0:\theta=\theta_0$ against $H_a:\theta>\theta_0.$
If the test in 1. is to have $\theta_0=100, \alpha=0.05,$ and
$\beta=0.05$ when $\theta_a=400,$ find the appropriate sample size and
critical region.

Note 1: This is Problem 10.80 in Mathematical Statistics with Applications, 5th. Ed., by Wackerly, Mendenhall, and Sheaffer.

Note 2: This is cross-posted here.

My Work So Far:

This is a Weibull distribution. We construct the
likelihood function
$$L(\theta)=\szdp{\frac{m}{\theta}}^{\!\!n}\szdb{\prod_{i=1}^ny_i^{m-1}}
\exp\szdb{-\frac1\theta\sum_{i=1}^ny_i^m}.$$
Now we form the inequality indicated in the Neyman-Pearson Lemma:
\begin{align*}
\frac{L(\theta_0)}{L(\theta_a)}&<k\\
\frac{\displaystyle \szdp{\frac{m}{\theta_0}}^{\!\!n}\prod_{i=1}^ny_i^{m-1}
\exp\szdb{-\frac{1}{\theta_0}\sum_{i=1}^ny_i^m}}
{\displaystyle \szdp{\frac{m}{\theta_a}}^{\!\!n}\prod_{i=1}^ny_i^{m-1}
\exp\szdb{-\frac{1}{\theta_a}\sum_{i=1}^ny_i^m}}&<k\\
\frac{\displaystyle \theta_a^n
\exp\szdb{-\frac{1}{\theta_0}\sum_{i=1}^ny_i^m}}
{\displaystyle \theta_0^n
\exp\szdb{-\frac{1}{\theta_a}\sum_{i=1}^ny_i^m}}&<k\\
\frac{\theta_a^n}{\theta_0^n}\,\exp\szdb{-\frac{\theta_a-\theta_0}
{\theta_0\theta_a}\sum_{i=1}^ny_i^m}&<k\\
n\ln(\theta_a/\theta_0)-\frac{\theta_a-\theta_0}
{\theta_0\theta_a}\sum_{i=1}^ny_i^m&<\ln(k)\\
n\ln(\theta_a/\theta_0)-\ln(k)&<\frac{\theta_a-\theta_0}
{\theta_0\theta_a}\sum_{i=1}^ny_i^m.
\end{align*}
The end result is
$$\sum_{i=1}^ny_i^m>\frac{\theta_0\theta_a}{\theta_a-\theta_0}
\szdb{n\ln(\theta_a/\theta_0)-\ln(k)},$$
or
$$\sum_{i=1}^ny_i^m>k'.$$
We have to discover the distribution of $\displaystyle \sum_{i=1}^ny_i^m.$
I claim that the random variable $W=Y^m$ is exponentially distributed with
parameter $\theta.$ Proof:
\begin{align*}
f_W(w)
&=f\szdp{w^{1/m}}\frac{dw^{1/m}}{dw}\\
&=\frac{m}{\theta}\,(w^{1/m})^{m-1}\,e^{-w/\theta}\szdp{\frac1m}\,w^{(1/m)-1}\\
&=\frac1\theta\,w^{1-1/m}e^{-w/\theta}\,w^{(1/m)-1}\\
&=\frac1\theta\,e^{-w/\theta},
\end{align*}
which is the distribution of an exponential with parameter $\theta,$ as I
claimed. It follows, then, that $\displaystyle\sum_{i=1}^ny_i^m$ is
$\Gamma(n,\theta)$ distributed, and hence that
$\displaystyle\frac{2}{\theta}\sum_{i=1}^ny_i^m$ is $\chi^2$ distributed with
$2n$ d.o.f. So the RR we can write as that region where
$$\frac{2}{\theta}\sum_{i=1}^ny_i^m>\chi_\alpha^2,$$
with the $2n$ d.o.f. Let
$$U(\theta)=\frac{2}{\theta}\sum_{i=1}^ny_i^m.$$
Then we have
\begin{align*}
\alpha&=P\szdp{U(\theta_0)>\chi_\alpha^2}\\
\beta&=P\szdp{U(\theta_a)<\chi_\beta^2}.
\end{align*}
So now we solve
\begin{align*}
\frac{2}{\theta_0}\sum_{i=1}^ny_i^m&=\chi_\alpha^2\\
\frac{2}{\theta_a}\sum_{i=1}^ny_i^m&=\chi_\beta^2\\
\frac{\chi_\alpha^2\theta_0}{2}&=\frac{\chi_\beta^2\theta_a}{2}\\
\frac{\chi_\alpha^2}{\chi_\beta^2}&=\frac{\theta_a}{\theta_0}.
\end{align*}
So we choose $n$ so that the $\chi^2$ values corresponding to the ratio given
work out. The ratio of $\theta_a/\theta_0=4,$ and we choose $\chi_\alpha^2$ on
the high end, and $\chi_\beta^2$ on the low end so that their ratio is $4,$
by varying $n$. This happens at d.o.f. $13=2n,$ which means we must choose
$n=7.$ For this choice of $n,$ we have the critical region as
$$\frac{2}{\theta_0}\sum_{i=1}^ny_i^m>23.6848.$$

My Question: This is one of the most complicated stats problems I've encountered yet in this textbook, and I just want to know if my solution is correct. I feel like I'm "out on a limb" with complex reasoning depending on complex reasoning. I'm fairly confident that part 1 is correct, but what about part 2?

Best Answer

You've got $L=\frac{\theta_a^n}{\theta_0^n}\,exp\left({-\frac{\theta_a-\theta_0} {\theta_0\theta_a}\sum_{i=1}^ny_i^m}\right)<k$, which is good. Now we take $log$ of both sides:

$$n log\left(\frac{\theta_a}{\theta_0}\right)+\left(\frac{\theta_0-\theta_a}{\theta_0\theta_a}\right)\sum_{i=1}^{n}{y_i^m} < log(k)$$

and so the test itself is in the form: $$\left\{ \sum_{i=1}^{n}{y_i^m} < c \right\}$$

(rejecting if $\sum_{i=1}^{n}{y_i^m} > c$).

Now, for part (b), there's something to note here: $y^m$ has an exponential distribution, and so the $\sum{y^m_i}\sim \Gamma(n,\theta)$. Under the null we get that $\frac{2\sum_{i=1}^{n}{y_i^m}}{\theta_0} > \frac{2c}{\theta_0}$ has a $\chi^2$ distribution with $2n$ degrees of freedom (look for the relation between gamma and chi-squared).

Now let's solve (b):

$$\theta_0=100,\theta_a=400,\alpha=0.05,\beta=0.05$$

When $H_0$ is true, we get $\alpha$ using:

$$\alpha=P\left(\frac{2\sum_{i=1}^{n}{y_i^m}}{100} > \chi^2_{0.05}\right)=0.05.$$

When $H_a$ is true, we get $\beta$ using:

$$\beta=P\left(\frac{2\sum_{i=1}^{n}{y_i^m}}{100} \le \chi^2_{0.05} \middle| \theta=400\right)=P\left(\frac{2\sum_{i=1}^{n}{y_i^m}}{400} \le \frac{1}{4}\chi^2_{0.05} \middle| \theta=400\right)=P\left(\chi^2\le\frac{1}{4}\chi^2_{0.05}\right)=0.05$$

So, we need to find the row in $\chi^2$ table where $\frac{1}{4}\chi^2_{0.05}=\chi^2_{0.95}$:

You can see that for $12$ degrees of freedom, $\chi^2_{0.95}=5.226$ and $\chi^2_{0.05}=21.03$, which is the closest we get for achieving $\frac{1}{4}\chi^2_{0.05}=\chi^2_{0.95}$. Recall that this has $2n$ degrees of freedom, so the appropriate sample size is $6$.

Related Solutions

Hypothesis Testing – How to Perform the Most Powerful Lower Tail Test for Uniform Distribution

Your reasoning seems right to me.
Rejecting simply if $Y_{(n)}<\theta_a$ is not an $\alpha$-level test. For that we would need to look at the sampling distribution of the sample maximum. Under $H_0$ the sample maximum of $n$ uniform random variables has CDF

$$F_{Y_{(n)}}(y_{(n)})=\Big(\frac{y_{(n)}}{\theta_0}\Big)^n$$

For an $\alpha$-level test we can reject $H_0$ when $y_{(n)}<\theta_a$ and $\Big(\frac{y_{(n)}}{\theta_0}\Big)^n\le \alpha$.
The unbiased estimator is not needed to construct a test, but it is useful to have.
I believe so, but then there is nothing to test since $\theta_a<y_{(n)}$ is irrefutible evidence that $H_a$ is false.
If we change the problem so that $\theta_a>\theta_0$ then we would reject $H_0$ if $y_{(n)}>\theta_0$ since this is irrefutible evidence that $\theta>\theta_0$. If $\theta_a>\theta_0>y_{(n)}$ I'm not sure there is an obvious way to construct an $\alpha$-level test. We might be forced to use the larger value as the null value and smaller value as the alternative value. I haven't yet read your references so perhaps they have a different answer for this.

Let me know if I have made any mistakes.

Binomial Distribution – Using Likelihood Ratio Test for Equality of Several Proportions

In the following, I'm going to use the numbers supplied in the exercise. The null hypothesis is $H_0:p_1=p_2=p_3=p_4=p$. As you wrote, the likelihood function is $$ L(\mathbb{p})=\prod_{i=1}^4{200\choose y_i}p_i^{y_i}(1-p_i)^{200-y_i} $$ where $y_i$ are the number of voters favoring $A$ in ward $i$. Under $H_0$, the MLE of $p$ is $\hat{p}=\sum_{i=1}^{4}y_i/800$. Otherwise, we have $\hat{p}_i=y_i/200$ for $i=1, 2, 3, 4$. Plugging these into the likelihood function and forming the ratio, we get $$ \lambda = \dfrac{\left(\dfrac{\sum y_i}{800}\right)^{\sum y_i}\left(1 - \dfrac{\sum y_i}{800}\right)^{800 - \sum y_i}}{\prod_{i=1}^{4}\left(\dfrac{y_i}{200}\right)^{y_i}\left(1 - \dfrac{y_i}{200}\right)^{200 - y_i}}. $$ According to theorem 10.2 in the book, for large $n$, $-2\log(\lambda)$ has approximately a $\chi^2$ distribution with $\nu$ degrees of freedom when $H_0$ is true, where $\nu$ is the difference between the number of freely varying parameters in $\Omega$ and the number of such parameters in $\Omega_0$. Here, we have $4 - 1 = 3$ degrees of freedom.

Using the data provided in the book, we have $y_1 = 76, y_2 = 53, y_3 = 59, y_4 = 48$. According to the formula above, I get $-2\log(\lambda)=10.54$ and a $p$-value of $0.015$. This is very similar to what Rs prop.test gives:

y <- c(76, 53, 59, 48)
prop.test(y, rep(200, 4), correct = FALSE)

X-squared = 10.722, df = 3, p-value = 0.01333
alternative hypothesis: two.sided
sample estimates:
prop 1 prop 2 prop 3 prop 4 
 0.380  0.265  0.295  0.240

Best Answer

Related Solutions

Hypothesis Testing – How to Perform the Most Powerful Lower Tail Test for Uniform Distribution

Binomial Distribution – Using Likelihood Ratio Test for Equality of Several Proportions

Related Question