Solved – Neyman Pearson Lemma application

hypothesis testingself-study

A sample of size $n$ is taken from a random variable with probability density function $f(x)=(\theta+1)x^{\theta}$, $0 \leq x \leq 1$. Use the Neyman-Pearson Lemma to determine the form of the critical region for the most powerful test of $H_{0}: \theta=1$ against $H_{1}: \theta =2$. What I did was I found the likilehood function of $H_{0}$ and $H_{1}$ which gave me $2^{n}x^{n}$ and $3^{n}x^{2n}$ respectively. Then I obtained that $2^{n}x^{n}/3^{n}x^{2n} < c$ and then simplified that expression. However, I was wrong, so any help is appreciated.

Best Answer

Likelihood function

If the individual members in the sample are independent then the likelihood function can be based on the joint probability distribution (which is the product of the individual distributions):

$$\mathcal{L}(\theta;\mathbf{x}) = \prod_{i=1}^n f(x_i|\theta) = \prod_{i=1}^n (\theta + 1)x_i^\theta = (\theta+1)^n P(\mathbf{x})^\theta$$

where $P(\mathbf{x}) = \prod_{i=1}^n x_i$

and you get for the likelihood ratio:

$$\Lambda(\mathbf{x}) = \frac{\mathcal{L}(\theta_1;\mathbf{x})}{\mathcal{L}(\theta_2;\mathbf{x})} = \left( \frac{\theta_1+1}{\theta_2+1} \right)^n P(\mathbf{x})^{\theta_1-\theta_2}$$


Distribution of $P(\mathbf{x}) = \prod_{i=1}^n x_i$

$P(\mathbf{x}) = \prod_{i=1}^n x_i$ is the statistic that we can use to fully describe the likelihood ratio. Let's have a look how it is distributed.

We can find the distribution of $P$ by using the general formula for a product distribution:

$$ f(x_1 \cdot x_2 = p) = \int_p^1 x_2^{-1}f(p/x_2) f(x_2) dx_2 = (\theta+1)^2 p^\theta \log p $$

a process which you can continue and you have more generally

$$f(p) = \frac{(\theta+1)^n p^\theta(-\log p)^{n-1}}{(n-1)!}$$

in a plot this looks as following

distribution of the product of x_i


Principle of the Neyman-Pearson lemma

What you should note about these graphs is that the probability density for observing a particular $P$ is not the same for each $\theta$. With the hypothesis $\theta=1$ you are more likely to get smaller values of $P$ in comparison to the hypothesis $\theta = 2$

  • Now the point of the Neyman-Pearson lemma is to find a region where the probability of rejecting the null hypothesis, conditional that the null hypothesis is true, is equal to the chosen $\alpha$ level (e.g. 5%), while making the probability of rejecting the null hypothesis, conditional that the alternative hypothesis is true, maximum. This occurs when we choose those regions where the ratio $\Lambda$ is smallest.

In the first image ($n=3$) you see that the power is not really high (100-12.6 = 87.4% probability to make a type II error when $\theta=2$) but this power increases when $n$ increases. In the second image ($n=6$) the power has increased to 20.3%.


Choosing a critical region

The critical region is found by setting

$$Pr(\Lambda < c_\Lambda | H_0) = \alpha$$

Such that we only get a type I error (falsely rejecting the $H_0$ conditional on $H_0$ being true) in a fraction $\alpha$ of the cases.

To express this (in the case of the exercise) we can use $P(\mathbf{x}) = \prod_{i=1}^n x_i$, the statistic that fully describes the likelihood ratio. And when $\theta_1<\theta_2$ then we can say that if $\Lambda$ is smaller than some critical level then $P(\mathbf{x})$ is larger than some critical level. Thus we look for:

$$Pr(P(\mathbf{x})>c_P|H_0) = \alpha$$

which can be found using the earlier expressed distribution function of $P(\mathbf{x})$

$$F(p) = \int_0^p f(x) dx = \frac{(\theta+1)^n}{(n-1)!} \int_0^p x^\theta (-\log x)^{n-1} dx = 1-\alpha$$

Now this can be expressed analytically

$$\frac{(\theta+1)^n}{(n-1)!} \int_0^p x^\theta (-\log x)^{n-1} dx = p^{\theta+1} \sum_{i=0}^{n-1}\frac{(-(\theta+1) \log p)^{i}}{i!} = 1-\alpha$$

which is an implicit form of which I am not sure whether you can solve this in closed form, but it is not too difficult to use computational methods to solve it.

(Alternatively, you could express it in terms of the regularized incomplete gamma function. This relates also to the comment from StubbornAtom that the product of $x_i$ is related to the sum of $\log(x_i)$ and the distribution of these is a gamma distribution.)

Related Question