Sufficient statistics for $p$ for a random sample from $\text{Ber}(p)$ distribution

binomial distributionestimationstatistical-inferencestatistics

Let $X_1,X_2, X_3$ be a random sample from Bernoulli distribution $B(p)$. Which of the following is sufficient statistic for $p$ ?

$(A)\ \ X_{1}^{2}+X_{2}^{2}+X_{3}^{2}$

$(B) \ \ X_1+2X_{2}+X_{3}$

$(C)\ \ 2X_1-X_{2}-X_{3}$

$(D) \ \ X_1+X_{2}$

$(E)\ \ 3X_1+2X_{2}-4X_{3}$

We know $T=\sum_i^3 X_i$ is sufficient statistic for $p$ via Neymann Factorization theorem.

The reasoning for different options.

$(A)$ It's not a one-one function of sufficient statistic $T$ (If it was $T^2$ , was it sufficient statistic then? My answer is yes because it is one to one function of sufficient statistic our random variable is positive am I right?).

$(B)X_1+X_{2}+X_{3}+X_{2}$ It contains original statistic therefor it is suffient statistic for $p$.

$(D)$ It doesn't include $X_3$ So it's not sufficient statistic for $p$

All other options include subtraction in it so I ruled out all of them.

I think my reasoning is not very good I lack some intuition behind finding sufficient statistic. Correct me here, please.

Best Answer

Since $T=X_1+X_2+X_3$ is a minimal sufficient statistic for $p$, implying that it is a function of every other sufficient statistic, option (D) is eliminated.

Now $X_i^2$ is a one-to-one function function of $X_i$ because $X_i\in\{0,1\}$ for each $i$.

So $X_1^2+X_2^2+X_3^2$ is also a one-to-one function of $T$, implying that the former is sufficient for $p$. (Thanks to @Alex for pointing this out)

For the remaining options I create a table of the possible values that the statistics can take:

\begin{array}{|c|c|c|} \hline (X_1,X_2,X_3)&T_1=X_1+2X_2+X_3&T_2=2X_1-X_2-X_3&T_3=3X_1+2X_2-4X_3\\ \hline(0,0,0)&0&0&0\\ \hline(0,0,1)&1&-1&-4\\ \hline(0,1,0)&2&-1&2\\ \hline (0,1,1)&3&-2&-2\\ \hline (1,0,0)&1&2&3\\ \hline (1,0,1)&2&1&-1\\ \hline (1,1,0)&3&1&5\\ \hline (1,1,1)&4&0&1\\ \hline \end{array}

I am looking whether $T_1,T_2,T_3$ are one-to-one functions of the sample $(X_1,X_2,X_3)$ or not. If they are, then they are sufficient statistics. If you can see right away that the only bijection is $$T_3:\{0,1\}^3\to\{-4,-2,\ldots,3,5\}$$, then you are done. Because a bijection with the sample implies that observing $T_3(X_1,X_2,X_3)$ and observing $(X_1,X_2,X_3)$ are equivalent. In other words, $T_3$ is a sufficient statistic for $p$.

Alternatively, if you consider the case $T_1=2$, you will find that the conditional distribution $P(\{X_1,X_2,X_3\}\mid T_1)$ depends on $p$. Similar argument holds for $T_2$ if you consider the case $T_2=0$. Also it is apparent from the table above that $P(\{X_1,X_2,X_3\}\mid T_3)$ is independent of $p$ for all possible values of $T_3$, because $T_3$ takes 8 distinct values corresponding to 8 different tuples.

I used the following threads for reference: