Statistics – Sufficient Statistic for ? in N(?,?) Model

normal distributionprobability distributionsstatistical-inferencestatistics

I recently encountered a MCQ question which goes like this:

Let $X_1,X_2,…,X_n$ be independent random samples from $N(\theta,\theta)$, where both the mean and variance are $\theta$, where $\theta$ is unknown. Then which of the following statements is/are true?

(a) $\sum (X_i)^2$ is sufficient for $\theta$
(b) $\sum (X_i)$ is sufficient for $\theta$
(c) [$\sum (X_i)$,$\sum (X_i)^2$] is sufficient for $\theta$
(d) Sufficient statistics does not exist.

My Attempt: Using the factorization theorem,
$$f(x_i,\theta) = \frac{1}{\sqrt{2\pi\theta}} e^\frac{-(x_i-\theta)^2}{2\theta}$$
$$\prod f(x_i,\theta) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi\theta}} e^\frac{-(x_i-\theta)^2}{2\theta}
= (\frac{1}{\sqrt{2\pi\theta}})^n e^\frac{-\sum_{i=1}^{n} (x_i-\theta)^2}{2\theta} \\
=(\frac{1}{\sqrt{2\pi\theta}})^n (e^\frac{-\sum_{i=1}^{n} (x_i)^2}{2\theta}e^\frac{-\theta}{2}) e^{\sum_{i=1}^{n} (x_i)}
= f_1(t,\theta) f_2(x_i)$$
where $T=\sum_{i=1}^{n}(X_i)^2$.

Thus, the statistic $T$ is sufficient for $\theta$, implying (a) is ture and (d) is false.

However, the answer for the MCQ is given as (a), (b) and (c). I am not able to prove (b) and (c) using the factorization theorem. Please help.

Best Answer

In the case where the parametric family is normal with unknown mean $\mu$ and variance $\sigma^2$, we already know that a sufficient statistic (indeed, joint MLE) is $T = (\hat \mu, \hat \sigma)$ where $$\hat \mu = \bar X, \quad \hat \sigma^2 = \frac{1}{n} \sum_{i=1}^n (X_i - \bar X)^2.$$ So when we have the additional constraint that $\mu = \sigma^2$, we immediately know that at least some data reduction is attainable through $T$, which excludes answer choice (d). Even more trivially, the sample itself is a trivial sufficient statistic in which no data reduction occurs.

Moreover, if you found that $\sum X_i^2$ is sufficient for $\theta$, then you already know that $(c)$ in addition to $(a)$ must be true, since knowledge of $(c)$ gives full knowledge of $(a)$.

The only remaining issue is determining whether $\sum X_i$ alone is sufficient for $\theta$; that is to say, the truth of choice $(b)$ must be ascertained. The joint density is, as you computed,

$$\begin{align} f(\boldsymbol x) &= (2\pi)^{-n/2} \theta^{-n} \exp \left( -\sum_{i=1}^n \frac{(x_i - \theta)^2}{2\theta}\right) \\ &= (2\pi)^{-n/2} \theta^{-n} \exp \left( - \frac{1}{2\theta} \sum_{i=1}^n (x_i^2 - 2\theta x_i + \theta^2) \right) \\ &= (2\pi)^{-n/2} \theta^{-n} e^{-\sum X_i^2/(2\theta)} e^{\sum X_i} e^{-n\theta/2}, \end{align}$$ so for the choice $$h(\boldsymbol x) = (2\pi)^{-n/2} e^{\sum X_i}, \\ T(\boldsymbol x) = \sum X_i^2, \\ g(T \mid \theta) = \theta^{-n} e^{-T/(2\theta)} e^{-n\theta/2},$$

the factorization theorem shows that $\sum X_i$ is not sufficient for $\theta$; hence $(b)$ is false.


In case there is doubt about $(b)$ being false, we can easily construct two distinct samples for which the sample totals are equivalent, but the sum of squares are not. For instance,

$$\boldsymbol x = (1, 2, 3, 2, 6), \quad \boldsymbol x^* = (1, 2, 3, 4, 4)$$ both have a sample total of $14$, but the sum of squares are $54$ and $46$, respectively. Since we already proved that the sum of squares is sufficient for $\theta$, if I told you that the sample total is $14$, you could not tell me whether the sample's sum of squares is, for instance, $54$ or $46$, because both of these values could have arisen from a sample with total $14$. Consequently, having only the knowledge of the sample total is not enough to preserve all of the information about $\theta$ that was present in the sample itself.

Related Question