Sample size as a part of minimal sufficient statistic

statistical-inferencestatistics

In Casella and Berger, we have the following experiment: assume we first select a positive integer randomly, let us call it $N \in \{1, 2, \ldots \}$, s.t. $P(N = n) = p_n$. Then, for this $N = n$, we perform a classic Binomial experiment with parameter $\theta$. The number of positives/ones in the sample is $X$. It is asked to show that $(X, N)$ is a minimal sufficient statistic.

I found the joint distribution of the statistic:

$$f(x, n) = {n\choose x}\theta^x(1 – \theta)^{n – x}p_n$$

and the sample:

$$f(x_1, \ldots, x_n \mid \theta) = \sum_{n = 1}^\infty p_n \theta^{\sum_{i = 1}^n x_i}(1-\theta)^{n – \sum_{i = 1}^n x_i}$$

By factorization theorem, we can show that this is a sufficient statistic. This series must be convergent (marginal), and we can assume that the above is a function of $X = \sum_{i = 1}^N X_i$ and $N$, while taking $h(x) = 1$.

But how do I show it is a minimial statistic? I do not have the closed form, and I need to show that:

$$\frac{f(x_1, \ldots, x_{n_1} \mid \theta)}{f(y_1,\ldots,y_{n_2} \mid \theta)} = C \iff (X, N_1) = (Y, N_2)$$

How can I show this?

Additionally, the solution manual uses the joint of $f_{X, N}(x, n)$ to show the above, which is simple of course. But I think this is a mistake, because the theorem in the book requires to use the joint of the SAMPLE, and not the joint of the statistic. Theorem 6.2.13 page 281 Second Edition.

Am I write considering this a mistake in the solution manual ^

PS. Solution manual link, http://www.ams.sunysb.edu/~zhu/ams570/Solutions-Casella-Berger.pdf exercise 6.12 (a).

PPS. I think my confusion might come from defining the joint vs marginal distributions of $x_1, \ldots, x_n$ and $N = n$. How can we define $P(x_1, \ldots, x_n \mid N = n)$ and $P(x_1, \ldots, x_n, n)$? If I observe a sample, then $N$ is a function of $x$'s – I can count them. However, having a joint distribution, and then marginalising $N$ out also seems like a correct answer…

Best Answer

Your formula $$f(x_1, \ldots, x_n \mid \theta) = \sum_{n = 1}^\infty p_n \theta^{\sum_{i = 1}^n x_i}(1-\theta)^{n - \sum_{i = 1}^n x_i}$$ is problematic in two ways.

First, you use $n$ twice in two different contexts: you use it to denote the sample size on the left-hand side $x_1, x_2, \ldots, x_n$, but then you also use it as the index of summation on the right-hand side, where $n \in \{1, 2, 3, \ldots\}.$ This is clearly wrong.

Second, this expression reveals a misconception on your part, regarding the nature of the parametric model and what the sample looks like. As the question is posed, there is but a single binomial random variable $X$. Your joint PMF implies there are some arbitrary number of independent binomial random variables, possibly with different sample sizes.

To specify the model concretely, it is this: $$N \sim \operatorname{Categorical}(p_1, p_2, \ldots ), \\ \Pr[N = n] = p_n, \quad \sum_{n=1}^\infty p_n = 1. \\ B_i \sim \operatorname{Bernoulli}(\theta), \\ \Pr[B_i = 1] = \theta, \quad 0 < \theta < 1. \\ X = \sum_{i=1}^N B_i, \\ X \mid N \sim \operatorname{Binomial}(N,\theta), \\ \Pr[X = x \mid N = n] = \binom{n}{x} \theta^x (1 - \theta)^{n-x}, \quad x \in \{0, 1, 2, \ldots \}.$$ The sample in this case is not some set of IID binomial variables, but rather, the set of $\{B_i\}_{i=1}^N$ of Bernoulli variables, from which a single realization $X$ is obtained, as the question clearly states.

How does this affect the subsequent computation? Well, what the "official" solution doesn't explicitly state, but is helpful to keep in mind, is that there is no loss of information about $\theta$ when we take the sum $X$ rather than the sample $(B_1, \ldots, B_N)$; that is to say, $X$ is sufficient for $\theta$ when $N$ is fixed. When we write the joint PMF for $(X, N)$ in factored form, we do not have access to the $B_i$ but we don't need to, because realization of $N$ implies that $X$ has not discarded information about $\theta$. This is what lets us assume the sample comprises the single ordered pair $(X, N)$, rather than $(B_1, \ldots, B_N)$: some data reduction has already taken place.

However, we can also write and factor $$\begin{align} f(\boldsymbol b , n \mid \theta) &= \left(\prod_{i=1}^n \theta^{b_i} (1 - \theta)^{1 - b_i} \mathbb 1 (b_i \in \{0, 1\})\right) p_n \mathbb 1 (n \in \mathbb Z^+) \\ &= \theta^x (1-\theta)^{n-x} p_n \mathbb 1 (x \in \{0, \ldots, n\}) \mathbb 1 (n \in \mathbb Z^+) \\ \end{align}$$ where $$x = \sum_{i=1}^n \mathbb 1 (b_i = 1),$$ hence the choice $$h(\boldsymbol b, n) = \mathbb 1 (x \in \{0, \ldots, n\}) \mathbb 1 (n \in \mathbb Z^+), \\ T(\boldsymbol b, n) = (t_1(\boldsymbol b), t_2(\boldsymbol b)) = (x,n), \\ g((t_1, t_2) \mid \theta) = \theta^{t_1} (1 - \theta)^{t_2 - t_1}. $$ Thus $(x,n)$ is sufficient for $\theta$ (remember, $T$ is a function of the $b_i$ and $n$, since $x$ is a function of these). The conclusion for sufficiency is the same whether we use a binomial or Bernoulli model. Having established this, for minimal sufficiency, it is best to work from $X$ directly, which is what is done in the official solution.

Related Question