Minimal Sufficient Statistic – Basic Intuition and Understanding

intuitionmathematical-statisticssufficient-statistics

As stated by Wikipedia:

A sufficient statistic is minimal sufficient if it can be represented as a function of any other sufficient statistic. In other words, $S(X)$ is minimal sufficient if and only if
$S(X)$ is sufficient, and
if $T(X)$ is sufficient, then there exists a function f such that $S(X) = f(T(X))$.
Intuitively, a minimal sufficient statistic most efficiently captures all possible information about the parameter $\theta$.

I have some trouble understanding the full meaning of "minimal"

What I don't get is, what happens if we have a third sufficient statistic, that we call $U(X)$ such that:

$U(X)=g(S(X))=f(g(T(X)))$

In this case are both $U(X)$ and $S(X)$ minimal? I ask this because minimal makes me think that it must be the "most minimal" so there can be only one (group) of minimal statistics.

If I am not wrong if f or g are invertible Then the three sufficient statistics are all minimal but in this case they all belong to the same group. In the case f and g are not invertible,the three statistics: $U(X),S(X),T(X)$ have all different efficiency in capturing information

Best Answer

Let the sample space be $\mathcal{X}$. Then a sufficient statistic $T$ can be seen as indexing a partition of $\mathcal{X}$, that is, $T(x)=T(y)$ iff (if and only if) $x,y$ belongs to the same element of the partition. A minimallly sufficient statistic is then giving a maximal reduction of the data. That is to say, if $T$ is minimally sufficient, then if we take the partition corresponding to $T$, take two distinct elements of that partition, and makes a new partition by replacing the two by their union, the resulting statistic is not longer sufficient. So, any other sufficient statistic, say $S$, which is not minimal, will have a partition which corresponds to a refinement of the partition of $T$, that is, every element of the partition of $T$ is a union of elements of the partition of $S$ (this becomes easier to understand if you make a drawing from my text!). So, when you know the value of $S$, you know in which element of the partition of $S$ that sample point belongs, and also in which element of the partition of $T$ that sample point belongs — since that partition is coarser. That is what it means when it says that $T$ is a function of every other sufficient statistic — every other sufficient statistic gives more information (or the same information) about the sample than what $T$ does.

Definition: A partition of $\mathcal{X}$ is a collection of subsets of $\mathcal{X}$ such that $\cup_{\alpha} \mathcal{X}_\alpha = \mathcal{X}$ and
$\mathcal{X}_\alpha \cap \mathcal{X}_\beta = \emptyset$ unless the two elements of the partition are identical, that is , $\alpha=\beta$.