Hypothesis Testing – Uniformly Most Powerful Test for Weibull Distribution

hypothesis testingmathematical-statisticsself-studystatistical-powerweibull distribution

$\newcommand{\szdp}[1]{\!\left(#1\right)} \newcommand{\szdb}[1]{\!\left[#1\right]}$
Problem Statement: Let $Y_1,\dots,Y_n$ be a random sample from the probability
density function given by
$$f(y|\theta)=
\begin{cases}
\dfrac1\theta\,m\,y^{m-1}\,e^{-y^m/\theta},&y>0\\
0,&\text{elsewhere}
\end{cases}
$$

with $m$ denoting a known constant.

  1. Find the uniformly most powerful test for testing
    $H_0:\theta=\theta_0$ against $H_a:\theta>\theta_0.$
  2. If the test in 1. is to have $\theta_0=100, \alpha=0.05,$ and
    $\beta=0.05$ when $\theta_a=400,$ find the appropriate sample size and
    critical region.

Note 1: This is Problem 10.80 in Mathematical Statistics with Applications, 5th. Ed., by Wackerly, Mendenhall, and Sheaffer.

Note 2: This is cross-posted here.

My Work So Far:

  1. This is a Weibull distribution. We construct the
    likelihood function
    $$L(\theta)=\szdp{\frac{m}{\theta}}^{\!\!n}\szdb{\prod_{i=1}^ny_i^{m-1}}
    \exp\szdb{-\frac1\theta\sum_{i=1}^ny_i^m}.$$

    Now we form the inequality indicated in the Neyman-Pearson Lemma:
    \begin{align*}
    \frac{L(\theta_0)}{L(\theta_a)}&<k\\
    \frac{\displaystyle \szdp{\frac{m}{\theta_0}}^{\!\!n}\prod_{i=1}^ny_i^{m-1}
    \exp\szdb{-\frac{1}{\theta_0}\sum_{i=1}^ny_i^m}}
    {\displaystyle \szdp{\frac{m}{\theta_a}}^{\!\!n}\prod_{i=1}^ny_i^{m-1}
    \exp\szdb{-\frac{1}{\theta_a}\sum_{i=1}^ny_i^m}}&<k\\
    \frac{\displaystyle \theta_a^n
    \exp\szdb{-\frac{1}{\theta_0}\sum_{i=1}^ny_i^m}}
    {\displaystyle \theta_0^n
    \exp\szdb{-\frac{1}{\theta_a}\sum_{i=1}^ny_i^m}}&<k\\
    \frac{\theta_a^n}{\theta_0^n}\,\exp\szdb{-\frac{\theta_a-\theta_0}
    {\theta_0\theta_a}\sum_{i=1}^ny_i^m}&<k\\
    n\ln(\theta_a/\theta_0)-\frac{\theta_a-\theta_0}
    {\theta_0\theta_a}\sum_{i=1}^ny_i^m&<\ln(k)\\
    n\ln(\theta_a/\theta_0)-\ln(k)&<\frac{\theta_a-\theta_0}
    {\theta_0\theta_a}\sum_{i=1}^ny_i^m.
    \end{align*}

    The end result is
    $$\sum_{i=1}^ny_i^m>\frac{\theta_0\theta_a}{\theta_a-\theta_0}
    \szdb{n\ln(\theta_a/\theta_0)-\ln(k)},$$

    or
    $$\sum_{i=1}^ny_i^m>k'.$$
  2. We have to discover the distribution of $\displaystyle \sum_{i=1}^ny_i^m.$
    I claim that the random variable $W=Y^m$ is exponentially distributed with
    parameter $\theta.$ Proof:
    \begin{align*}
    f_W(w)
    &=f\szdp{w^{1/m}}\frac{dw^{1/m}}{dw}\\
    &=\frac{m}{\theta}\,(w^{1/m})^{m-1}\,e^{-w/\theta}\szdp{\frac1m}\,w^{(1/m)-1}\\
    &=\frac1\theta\,w^{1-1/m}e^{-w/\theta}\,w^{(1/m)-1}\\
    &=\frac1\theta\,e^{-w/\theta},
    \end{align*}

    which is the distribution of an exponential with parameter $\theta,$ as I
    claimed. It follows, then, that $\displaystyle\sum_{i=1}^ny_i^m$ is
    $\Gamma(n,\theta)$ distributed, and hence that
    $\displaystyle\frac{2}{\theta}\sum_{i=1}^ny_i^m$ is $\chi^2$ distributed with
    $2n$ d.o.f. So the RR we can write as that region where
    $$\frac{2}{\theta}\sum_{i=1}^ny_i^m>\chi_\alpha^2,$$
    with the $2n$ d.o.f. Let
    $$U(\theta)=\frac{2}{\theta}\sum_{i=1}^ny_i^m.$$
    Then we have
    \begin{align*}
    \alpha&=P\szdp{U(\theta_0)>\chi_\alpha^2}\\
    \beta&=P\szdp{U(\theta_a)<\chi_\beta^2}.
    \end{align*}

    So now we solve
    \begin{align*}
    \frac{2}{\theta_0}\sum_{i=1}^ny_i^m&=\chi_\alpha^2\\
    \frac{2}{\theta_a}\sum_{i=1}^ny_i^m&=\chi_\beta^2\\
    \frac{\chi_\alpha^2\theta_0}{2}&=\frac{\chi_\beta^2\theta_a}{2}\\
    \frac{\chi_\alpha^2}{\chi_\beta^2}&=\frac{\theta_a}{\theta_0}.
    \end{align*}

    So we choose $n$ so that the $\chi^2$ values corresponding to the ratio given
    work out. The ratio of $\theta_a/\theta_0=4,$ and we choose $\chi_\alpha^2$ on
    the high end, and $\chi_\beta^2$ on the low end so that their ratio is $4,$
    by varying $n$. This happens at d.o.f. $13=2n,$ which means we must choose
    $n=7.$ For this choice of $n,$ we have the critical region as
    $$\frac{2}{\theta_0}\sum_{i=1}^ny_i^m>23.6848.$$

My Question: This is one of the most complicated stats problems I've encountered yet in this textbook, and I just want to know if my solution is correct. I feel like I'm "out on a limb" with complex reasoning depending on complex reasoning. I'm fairly confident that part 1 is correct, but what about part 2?

Best Answer

You've got $L=\frac{\theta_a^n}{\theta_0^n}\,exp\left({-\frac{\theta_a-\theta_0} {\theta_0\theta_a}\sum_{i=1}^ny_i^m}\right)<k$, which is good. Now we take $log$ of both sides:

$$n log\left(\frac{\theta_a}{\theta_0}\right)+\left(\frac{\theta_0-\theta_a}{\theta_0\theta_a}\right)\sum_{i=1}^{n}{y_i^m} < log(k)$$

and so the test itself is in the form: $$\left\{ \sum_{i=1}^{n}{y_i^m} < c \right\}$$

(rejecting if $\sum_{i=1}^{n}{y_i^m} > c$).

Now, for part (b), there's something to note here: $y^m$ has an exponential distribution, and so the $\sum{y^m_i}\sim \Gamma(n,\theta)$. Under the null we get that $\frac{2\sum_{i=1}^{n}{y_i^m}}{\theta_0} > \frac{2c}{\theta_0}$ has a $\chi^2$ distribution with $2n$ degrees of freedom (look for the relation between gamma and chi-squared).

Now let's solve (b):

$$\theta_0=100,\theta_a=400,\alpha=0.05,\beta=0.05$$

When $H_0$ is true, we get $\alpha$ using:

$$\alpha=P\left(\frac{2\sum_{i=1}^{n}{y_i^m}}{100} > \chi^2_{0.05}\right)=0.05.$$

When $H_a$ is true, we get $\beta$ using:

$$\beta=P\left(\frac{2\sum_{i=1}^{n}{y_i^m}}{100} \le \chi^2_{0.05} \middle| \theta=400\right)=P\left(\frac{2\sum_{i=1}^{n}{y_i^m}}{400} \le \frac{1}{4}\chi^2_{0.05} \middle| \theta=400\right)=P\left(\chi^2\le\frac{1}{4}\chi^2_{0.05}\right)=0.05$$

So, we need to find the row in $\chi^2$ table where $\frac{1}{4}\chi^2_{0.05}=\chi^2_{0.95}$:

chi square distribution table

You can see that for $12$ degrees of freedom, $\chi^2_{0.95}=5.226$ and $\chi^2_{0.05}=21.03$, which is the closest we get for achieving $\frac{1}{4}\chi^2_{0.05}=\chi^2_{0.95}$. Recall that this has $2n$ degrees of freedom, so the appropriate sample size is $6$.