Ask the detail in the proof of Neyman-Pearson Lemma (Sufficiency Part)

integrationstatistical-inferencestatistics

This is Theorem 8.3.12 (Neyman-Pearson Lemma) in George Casella stat inference. Consider testing $H_0: \theta=\theta_0$ versus $H_1: \theta=\theta_1$, where the pdf pr pmf corresponding to $\theta_i$ is $f(\textbf{x}|\theta_i),i=0,1.$, using a test with rejection region R that satisfies

$$x\in R , if f(\textbf{x}|\theta_1)>kf(\textbf{x}|\theta_0)$$ and
$$x\in R^c , if f(\textbf{x}|\theta_1)<kf(\textbf{x}|\theta_0)$$ for some $k \geq 0$, and
$$\alpha=P_{\theta_0}(\textbf{x}\in R).$$

Then (Sufficiency): Any test that satisfies the above is a UMP level $\alpha$ test.

In the proof, I understand $$[\phi(\textbf{x})-\phi'(\textbf{x})]
[f(\textbf{x}|\theta_1)-kf(\textbf{x}|\theta_0)] \ge 0$$
. But I don't stand why it still $\ge 0$ after we add an integral. The accepted answer in this question told me this is not the case. Is it always true that integral of nonnegative function is non negative

So I am confused.

enter image description here

Best Answer

You confuse the notion of "antiderivative" and "definite integral".

In the question you referenced, the answer implies that ANTIDERIVATIVE can be negative. For example, as you know:

$$\int f(x) = F(x) + C$$

where $C$ can be anything, like a negative number. For example, if we take $C = -4$, then

$$\int 0 dx = C = -4$$

which is correct since we found a FUNCTION s.t. $(-4)' = 0$.

But this is not the same as in Casella-Berger. C&B refers to the DEFINITE integral over all sample points in $x \in \chi$ (check full definition in 8.3.3).

In our case above, if we add limits of integration, and assume $F(x) = C = -4$:

$$\int_{a}^b 0 dx = F(x)|_a^b = (-4)|_a^b = (-4) - (-4) = 0$$

Thus, we have shown the the definite integral and the antiderivative are not the same thing: the former is a number (which is zero if the integrand is zero), the latter is a function (which can be negative even if the integrand is zero).

Now, as for why it should be greater than zero, we can write the Riemann integral:

$$\int_a^b f(x)dx = \lim_{n \rightarrow \infty} \sum_{i = 1}^n (x_i - x_{i-1})f(x_i)$$

where $\{x_0, .., x_n\}$ is a partition of the interval $(a, b)$. Then, if we know that $f(x) \geq 0$, then

$$\sum_{i = 1}^n (x_i - x_{i-1})f(x_i) \geq \sum_{i = 1}^n (x_i - x_{i-1})0 = 0$$

from where

$$\int_a^b f(x)dx = \lim_{n \rightarrow \infty}\sum_{i = 1}^n (x_i - x_{i-1})f(x_i) \geq \lim_{n \rightarrow \infty}\sum_{i = 1}^n (x_i - x_{i-1})0 = 0$$

So, in Casella & Berger they do the same, but with $f(x | \theta_i)$ over the sample space of $x \in \chi$ so the inequality holds.

Related Question