Probability – Understanding Sufficient Statistic

probabilityprobability distributionsprobability theorysamplingstatistics

A sufficient statistic for a parameter is a statistic that captures all the information about a given parameter contained in the sample.

My question: Is the above sentence correct. (I think it is). If yes then what is the purpose of a sufficient statistic? I mean it does not give any additional information about the unknown parameter (to be estimated) that is not already present in the sample at the first place. So what is the use of sufficiency in Mathematical Statistics?

EDIT 1:

After @user164740 's response:

My queries:

1) So it means that a sufficient statistic can have less information about the parameter to be estimated than present in the given sample?

2)And how would a worse statistic (in terms of information contained about the parameter) would help if the given statistic is not helpful? I mean how is the given sufficient statistic helpful and how would a worse statistic be helpful in estimating a parameter?

Best Answer

Your definition of sufficiency is correct.

Sufficiency pertains to data reduction, not merely estimation. A sufficient statistic need not estimate anything. For example, if $X_1, \ldots, X_n$ are iid samples drawn from an exponential distribution with unknown mean $\theta$, then $\bar X$ is sufficient for $\theta$, but so is $(X_1 + \cdots + X_{n-1}, X_n)$. The former achieves greater data reduction--the latter achieves less reduction, since it consists of two numbers. The former is itself an estimator of $\theta$; the latter does not estimate $\theta$ directly; you need to transform it somehow: you could, for example, decide to make the estimator $X_n$ from this sufficient statistic, but this estimator is not sufficient nor is it a particularly "good" estimator.

The purpose of sufficiency is to demonstrate that statistics that satisfy this property do not discard information about the parameter, and as such, estimators that might be based on a sufficient statistic are in a sense "good" ones to choose.

In regard to your second question, let's go back to the exponential example. A non-sufficient statistic that was mentioned was $X_n$. This statistic simply discards all the previous observations and keeps only the last. And yes, it does estimate $\theta$: note $\operatorname{E}[X_n] = \theta$ by definition, and so it is even an unbiased estimator. But does it perform very well? No; its asymptotic variance is constant and independent of the sample size, meaning that no matter how large a sample size you choose, this estimator never gets any closer to estimating the true value of $\theta$ on average--and of course, this makes intuitive sense. You've discarded all the previous observations.

A better estimator would be to take the mean of all the odd-numbered observations; e.g., $(X_1 + X_3 + \cdots + X_{2n-1})/(2n-1)$, and yes, this too is an unbiased estimator of $\theta$. Still, you can see why it's not as good as the mean of all the observations. It does achieve data reduction, but since it is not a sufficient statistic, it "wastes" too much. That's what being able to show sufficiency gets you; if an estimator is sufficient, it isn't "wasteful."

Related Question