[Math] Examples of sufficient statistics for non-exponential family distributions

probability distributionsstatistics

I know that the Pitman-Koopman-Darmois theorem says that only exponential family distributions have sufficient statistics whose dimension stays constant as the sample size increases.

I further know that the Fisher-Neyman factorization theorem says that any distribution with a sufficient statistic can be factorized as
$f(x;\theta) = h(x)\cdot g(T(x);\theta).$

Trivially, if $T(x)$ is a bijection, then $g$ could just invert $T$ and recover the whole original sample, and thus $T$ should be "sufficient". But the way the Pitman-Koopman-Darmois theorem is always stated raises what I think is an obvious question, but to which I can't seem to find a clear answer :

Are there any distributions which have sufficient statistics which grow in dimension as the sample size grows, and which are "lossy" functions of their input data?

In particular, I'm thinking that if there's some class of distributions where the dimension of $T$ grows sublinearly with the dataset size, then those distributions could be incredibly useful in distributed computational settings.

Best Answer

Your statement of the Pitman-Koopman-Darmois theorem is off; there is an additional assumption that the support of $\mathcal X$ does not change as $\theta$ changes where $\mathcal X$ is the support of $X_1$ and $\theta$ parameterizes the family. A quick counterexample to the statement of the theorem as given in the OP is the family $\{\mbox{Uniform}(0, \theta): \theta > 0\}$ for which $\max\{X_j, 1 \le j \le n\}$ is sufficient and does not vary with the sample size $n$.

More in the spirit of your question, the answer is yes, there do exist distributions who sufficient statistics are "lossy" even when the conditions of the PKD theorem are satisfied. Consider $X_1, X_2, ...$ iid from a Gamma distribution with shape parameter $\alpha$ (known) and mean $\mu$, and $Z_1, Z_2, ...$ iid Bernoulli with success probability $p$ also known. Then take $Y_i = Z_i X_i - (1 - Z_i)$, and our sample becomes $Y_1, Y_2, ...$. We only get information about $\mu$ when $Y_i \ne -1$, so our sufficient statistic is $\sum_{i: Y_i \ne -1} Y_i$ which grows like $pn$ on average.

Sublinear growth is possible, proceeding along the train of thought suggested above, i.e. using mixtures of distribution, and indeed this is getting at something that is useful in practice. Take $(X_1, Z_1), (X_2, Z_2), ...$ to be iid distributed according to an infinite mixture of normals $f(x, z | \pi, \mu) = \prod_{i = k} ^ \infty [\pi_k N(x | \mu_k, 1)]^{I(Z_i = k)}$, with $Z_i$ an indicator of which cluster $X_i$ is in (I'm not sure if the representation of it via a density that I wrote is valid but you should get the general idea); the dimension of the sufficient statistics should increase only when new clusters are discovered, and the rate of the appearance of new clusters can be controlled by taking $\{\pi_k\}_{k = 1} ^ \infty$ to be known and carefully choosing them; my hunch is that it should be easy to make it grow at a rate of $\log(1 + Cn)$ since I think this is how fast the number of clusters grows in the Dirichlet process.

Related Question