Solved – Understanding a characterization of minimal sufficient statistics

mathematical-statisticsself-studysufficient-statistics

I have some questions regarding the proof of the theorem below.

First we need a definition:

A statistic $T$ is minimal sufficient iff $T$ is a function of any other sufficient statistic.

That is, for any sufficient statistic $S$ there exists a function $H$ such that $T=H(S)$.

Also consider this:

Let $\mathcal{K}$ be the set of all pairs $(\boldsymbol{x,y})$ for which there is a $k(\boldsymbol{x,y})>0$ such that $L(\theta;\boldsymbol{x}) = k(\boldsymbol{x,y})L(\theta;\boldsymbol{y})$.

Theorem

Let $T$ be a sufficient statistic for $\mathcal{P} = \{P_{\theta}:\theta \in \Theta \} $. If for all $(\boldsymbol{x,y}) \in \mathcal{K}, T(\boldsymbol{x}) =T(\boldsymbol{y}) $, then $T$ is minimal sufficient.

The Proof

Let $S$ be an arbitrary sufficient statistic and the pair $(\boldsymbol{x,y})$ such that $S(\boldsymbol{x}) =S(\boldsymbol{y}) $. By the factorization criterion we have:

\begin{equation}L(\theta; \boldsymbol{x}) = g(S(\boldsymbol{x});\theta)h(\boldsymbol{x}) \quad and \quad L(\theta; \boldsymbol{y})=g(S(\boldsymbol{y});\theta)h(\boldsymbol{y}) \end{equation}
and therefore

\begin{equation}L(\theta;\boldsymbol{x}) = \dfrac{h(\boldsymbol{x})}{h(\boldsymbol{y})}L(\theta;\boldsymbol{y}) \end{equation}

Thus $(\boldsymbol{x,y}) \in \mathcal{K}$ and $T(\boldsymbol{y}) = T(\boldsymbol{x})$. But this means $T$ is a function of $S$.

Question

How does one see that $T$ is a function of $S$ in the proof?

Also, why are they imposing a restriction $S(\boldsymbol{x}) = S(\boldsymbol{y})$ in the proof? what if $S(\boldsymbol{x}) \neq S(\boldsymbol{y})$?

Best Answer

The symbols $\boldsymbol{x,y}$ refer to data. Associated with each possible value $v$ of the statistic $S$ is a collection of possible data values $S^{-1}(v)$ for which $S$ has the value $v$. Since $T$ has the same value (call it $w$) on every such dataset, we may define $H(v) = w$.

The case $S(\boldsymbol{x}) \ne S(\boldsymbol{y})$ is irrelevant to the theorem.


Let's interpret the theorem. Say that two datasets $\boldsymbol{x,y}$ are equivalent if the relative likelihood function

$$\theta \to \frac{L(\theta,\boldsymbol{x})}{L(\theta,\boldsymbol{y})}$$

is constant. This means that any analysis based on comparing likelihoods (for different values of $\theta$) will not make any distinction between any two equivalent datasets. The theorem informs us that a minimal sufficient statistic will not ever distinguish between two equivalent datasets (that is, it must have the same value on each).

The proof of the theorem proceeds by noting that any two datasets having the same value of $S$ must be equivalent (provided that $S$ is sufficient) and therefore $T$ will have the same value on those datasets.

We might picture this by supposing that this equivalence relation among datasets partitions $\Omega$ into separate, overlapping components, each being a collection of equivalent datasets. Sufficient statistics have different values on different components: this guarantees that they can discriminate among inequivalent datasets. However, their values within any given component might vary (thereby discriminating among some equivalent datasets, too). Any minimal sufficient statistic, though, will be constant on each component: it will not discriminate between two equivalent datasets.


The following is a formal mathematical demonstration that $T$ is a function of $S$.

Let the set of all possible such data be $\Omega$. A statistic, such as $S$ or $T$, assigns some kind of mathematical object to each dataset $\boldsymbol{x}\in\Omega$--such as a number or vector--that we can calculate with. The details don't matter, so suppose $S$ assigns objects in a set $V$ and $T$ assigns objects in a set $W$. If the function $H$ exists, then it is a map from $V$ to $W$ (depending on $S$, by the way):

$$\begin{array}{rrcll} \; &&\Omega \\ \; &^S\swarrow & &\searrow^T \\ \; V & &\xrightarrow{H} &&W \\ \end{array}$$

Given $T$ and $S$ as in the question, what we know is that

For all $\boldsymbol{x,y}\in\Omega$, $S(\boldsymbol{x}) = S(\boldsymbol{y})$ implies $T(\boldsymbol{x}) = T(\boldsymbol{y})$.

From this we would like to deduce the existence of a function $H:V\to W$ such that $T = H\circ S$: that is, $T(\boldsymbol{x}) = H(S(\boldsymbol{x}))$ for all $\boldsymbol{x}\in\Omega$.

$H$ can be found with an explicit construction. One way begins with a function $H^{*}$ defined on $V$ whose values are subsets of $W$ defined as

$$H^{*}(v) = T(S^{-1}(v)) = \{T(\boldsymbol{x})\,|\, S(\boldsymbol{x}) = v\}.$$

I claim that all elements of $H^{*}(v)$ are equal, no matter what $v\in V$ might be. To prove the claim let $u, w\in H^{*}(v)$. By definition, this means there are $\boldsymbol{x,y}\in\Omega$ such that $u=T(\boldsymbol{x})$ and $w=T(\boldsymbol{y})$ and $S(\boldsymbol{x}) = S(\boldsymbol{y}) = v$. The latter equality implies $u = T(\boldsymbol{x}) = T(\boldsymbol{y}) = w$, proving the claim.

This claim enables us to define $H(v)$ whenever $H^{*}(v)$ is nonempty: it's the unique element of $H^{*}(v)$. To complete the definition of $H$, pick an arbitrary $w_0\in W$ and set $H(v) = w_0$ when $H(v)$ is empty. Formally,

$$H(v) = \left\{ \begin{array}{ll} w & \text{if } H^{*}(v) = \{w\}\\ w_0 & \text{if } H^{*}(v) = \emptyset. \end{array} \right. $$

Related Question