Mathematical Statistics – How to Determine if a Statistic is Sufficient for Variance?

mathematical-statisticssufficient-statisticsvariance

I have $X_1,\dots,X_n,X_{n+1}\overset{iid}{\sim}F_X(x)$, where $F_X$ has a finite mean $\mu$ and variance $\sigma^2$.

If I calculate $\bar X_n = \dfrac{1}{n}\sum_{i=1}^n$ and $S^2_n = \dfrac{1}{n-1}\sum_{i=1}^n\left(x_i – \bar x_n\right)^2$ based on the first $n$ observations, I am able to use those, along with $n$ and $X_{n+1}$, to calculate $S^2_{n+1} = \dfrac{1}{(n+1)-1}\sum_{i=1}^{n+1}\left(x_i – \bar x_n\right)^2$ based on all $n+1$ observations.

Does this make $(\bar X_n, S^2_n, n, X_{n+1})$ a sufficient statistic for $\sigma^2?$ If not, is my function of those four values a sufficient statistic for $\sigma^2?$

Intuitively, I say this should be the case, since I have as much information to estimate $\sigma^2$ by having $(\bar X_n, S^2_n, n, X_{n+1})$ as I do from having all of the $X_i$ values, but I struggle to formally prove this or even begin to prove it.

Best Answer

No: your argument would apply equally well to any family of distributions, not just the family of distributions with finite mean & variance, & it's easy to come up with counterexamples where the sample variance is not a component of the sufficient statistic (e.g. the family of gamma distributions having various scales & shapes, for which the sample arithmetic & geometric means are jointly sufficient). Sufficient statistics of fixed dimension are updateable (see When if ever is a median statistic a sufficient statistic? for why the sample median can never be sufficient) but the converse doesn't follow.

With i.i.d. samples from the non-parametric family you specify, the order statistic $(X_{(1)}, \ldots, X_{(n)})$ is minimal sufficient—only the order of the observations lacks information about the distribution from which they arise. It's also complete: consequently, the sample mean and variance, while not sufficient themselves, as functions of the order statistic are not only unbiased estimators of their population analogues, but the unique uniformly minimum-variance unbiased estimators.


If you know $(\bar X_n, S^2_n)$ is sufficient for a sample of size $n$, then $(\bar X_{n+1}, S^2_{n+1})$ is sufficient for a sample of size $n+1$. If you can show the latter statistic is a function of $(\bar X_n, S^2_n, X_{n+1})$, which is trivial, it follows that $(\bar X_n, S^2_n, X_{n+1})$ is also sufficient, as @whuber points out.

Related Question