[Math] Hypothesis testing with Poisson Data

poisson distributionstatistical-inferencestatistics

The random variable $X$ is $Po(\theta)$ distributed, with an observed value of $x=5$.

Im asked to test the hypothesis:

$H_0: \quad \theta \leq 3$

$H_1: \quad \theta >3$

I haven't really understood the whole concept behind hypothesis testing. But since $x=5$ is an observed value, $\hat{\theta}=5 $ can be used as a point estimator of $\theta$. How do I proceed now? Do I reject the null hypothesis since $\hat{\theta} >3$ ?

Best Answer

Several approaches are possible. One is to get an exact P-value and reject $H_0$ at the 5% level if it is smaller than 0.05.

Intuitively, you have observed $X = 5$ which might be taken as evidence that $\theta > 3.$ The question is whether 5 is enough bigger than 3 to be considered 'significantly' bigger and thus reject $H_0.$

Formaly, the $=$-sign in the null hypothesis determines the 'null distribution' used in testing. Here that's $\mathsf{Pois}(\theta = 3.)$ The P-value is the probability of a result 'as extreme or more extreme' than 5 (in the direction of $H_1.)$ That means we want $P(X \ge 5\,|\,\theta = 3).$ You can evaluate that using the Poisson PDF function, using a printed table of Poisson probabilities (if available), or using software. (I don't think this is a good situation for a normal approximation.) In R statistical software (where ppois is a Poisson CDF) we use $P(X \ge 5) = 1 - P(X \le 4) = 0.1845.$ Thus the P-value exceeds 5% and we do not reject $H_0.$

1 - ppois(4, 3)
## 0.1847368
x = 0:4; 1- sum(dpois(x,3))
## 0.1847368

The second computation in R sums five probabilities: $P(X = 0), \dots, P(X=4),$ where $X \sim \mathsf{POIS}(3),$ which may be mildly tedious but certainly possible to do on a calculator.

In the figure below, the P-value is the sum of the heights of the black bars to the right of the vertical red dashed line.

Note: You might be wondering just how large $X$ would have to be in order to reject $H_0.$ The computation in R below shows that $X = 7$ would lead to rejection at the $3.34\%$ level.

 qpois(.95, 3)     # Inverse CDF or quantile function
 ## 6              
 1 - ppois(6, 3)
 ## 0.03350854     # P(X >= 7) = 0.034

Related Solutions

Hypothesis Testing: One and Two-Sided Tests

Here is an example, with 20 normal observations in each of two groups.

sort(x1)
 [1]  73  78  80  89  90  90  90  93  94  94 100 100 106 109 112 120 123 124 128 139
sort(x2)
 [1]  64  65  75  76  77  77  79  80  83  86  89  93  94  95 100 100 102 105 115 117

Boxplots (sample 1 on the bottom)

Testing $H_0: \mu_1 \le \mu_2$ against $H_a: \mu_1 > \mu_2$ with a Welch (separate variances) two-sample t-test, we obtain the P-value 0.018, and so we can reject $H_0$ at the 2% level of significance (results from R statistical software):

t.test(x1, x2)

        Welch Two Sample t-test

data:  x1 and x2
t = 2.4783, df = 36.793, p-value = 0.01791
alternative hypothesis: true difference in means is not equal to 0  
sample estimates:
mean of x mean of y 
    101.6      88.6

Testing $H_0: \mu_1 \ge \mu_2$ against $H_a: \mu_1 < \mu_2$ with the same procedure in R (except for the parameter alte="greater"), we obtain the P-value 0.009, and so we can reject $H_0$ at the 1% level of significance. Notice that the P-value is half as large as for the two-sided test.

t.test(x1, x2, alte="greater")

        Welch Two Sample t-test

data:  x1 and x2
t = 2.4783, df = 36.793, p-value = 0.008957
alternative hypothesis: true difference in means is greater than 0
sample estimates:
mean of x mean of y 
    101.6      88.6

If we test $H_0: \mu_1 \le \mu_2$ against $H_a: \mu_1 > \mu_2$ we obtain the P-value greater than 1/2. We cannot reject here because sample means are $\bar X_1 = 101.6 > \bar X_2 = 88.6.$ So (as mentioned by @PhilH) we obviously have no reason to believe that populations means would have $\mu_1 > \mu_2.$

t.test(x1, x2, alte="less")

        Welch Two Sample t-test

data:  x1 and x2
t = 2.4783, df = 36.793, p-value = 0.991
alternative hypothesis: true difference in means is less than 0
sample estimates:
mean of x mean of y 
     101.6      88.6

Notes: (a) Notice that whether the test is one-sided or two-sided, the null hypothesis must always contain an $=$-sign (as $=, \le,$ or $\ge$). (b) The Welch two-sample t test is generally preferred over the 'pooled' two-sample t test because it does not make the assumption that population variances are equal. For these data the pooled test would give similar results, but you'd have df = 38. [In R you can get a pooled test by using the parameter var.eq=T.] (c) I have edited the R output slightly for relevance, but not changed any of the numbers. (d) Most statistical software and statistical calculators will do both one-sided and two-sided tests. In each case, look for the way to specify what kind of test you want. [In R, it's the parameter alte (or alternative].

Rejection region in hypothesis testing based on asymptotic distribution

With $T(X)=\min\limits_{1\le i\le n} X_i$, we have the exact distribution $n(T-\theta)\sim \mathsf{Exp}(1)$, which is the same as$$2n(T-\theta)\sim \chi^2_2\tag{*}$$

There is more than one way to derive a test for testing $H_0:\theta\le\theta_0$ against $H_1:\theta>\theta_0$.

One elementary way is to find the test corresponding to a given confidence interval for $\theta$ based on $T$.

Using the pivot in $(*)$, we have $$P_{\theta}\left(2n(T-\theta)< \chi^2_{2,\alpha}\right)=1-\alpha\quad\forall\,\theta\in\mathbb R\,,$$

where $\chi^2_{2,\alpha}$ denotes the $(1-\alpha)$th quantile of a $\chi^2_2$ distribution.

Or, $$P_{\theta}\left(\theta> T-\frac{\chi^2_{2,\alpha}}{2n}\right)=1-\alpha\quad,\forall\,\theta\tag{**}$$

This says that a (one-sided) $100(1-\alpha)\%$ confidence interval for $\theta$ is $$I(X)=\left(T-\frac{\chi^2_{2,\alpha}}{2n},\infty\right)$$

From $(**)$, we can say that for some $\theta_0$,

$$P_{\theta_0}\left(\theta_0< T-\frac{\chi^2_{2,\alpha}}{2n}\right)=P_{\theta_0}\left(T>\theta_0+\frac{\chi^2_{2,\alpha}}{2n}\right)=\alpha$$

So the size $\alpha$ test obtained by 'inverting' the interval $I$ rejects $H_0$ whenever $T>\theta_0+\frac{\chi^2_{2,\alpha}}{2n}$.

It can be shown that this test is in fact the likelihood ratio test and also the uniformly most powerful test (via this result) for testing $H_0$ against $H_1$.

Best Answer

Related Solutions

Hypothesis Testing: One and Two-Sided Tests

Rejection region in hypothesis testing based on asymptotic distribution

Related Question