Solved – Sufficient statistic, specifics/intuition problems

mathematical-statisticssufficient-statistics

I'm teaching myself some statistics for fun and I have some confusion regarding sufficient statistics. I'll write out my confusions in list format:

  1. If a distribution has $n$ parameters then will it have $n$ sufficient statistics?

  2. Is there any sort of direct correspondence between the sufficient statistics and the parameters? Or do the sufficient statistics just serve as a pool of "information" so that we can recreate the setting so we can calculate the same estimates for the parameters of the underlying distribution.

  3. Do all distributions have sufficient statistics? ie. can the factorization theorem ever fail?

  4. Using our sample of data, we assume a distribution that the data is most likely to be from and then can calculate estimates (e.g. the MLE) for the parameters for the distribution. Sufficient statistics are a way to be able to calculate the same estimates for the parameters without having to rely on the data itself, right?

  5. Will all sets of sufficient statistics have a minimal sufficient statistic?

This is the material which I am using to try to understand the topic matter:
https://onlinecourses.science.psu.edu/stat414/node/283

From what I understand we have a factorization theorem which separates the joint distribution into two functions, but I do not understand how we are able to extract the sufficient statistic after factorizing the distribution into our functions.

  1. The Poisson question given in this example had a clear factorization, but then it was stated that the sufficient statistics were the sample mean and the sample sum. How did we know that those were the sufficient statistics just by looking at the form of the first equation?

  2. How is it possible to conduct the same MLE estimates using sufficient statistics if the second equation of the factorization result will sometimes depend on the data values $X_i$ themselves? For instance in the Poisson case the second function depended on the inverse of the product of the factorials of the data, and we would no longer have the data!

  3. Why would the sample size $n$ not be a sufficient statistic, in relation to the Poisson example on the webpage? We would require $n$ to reconstruct certain parts of the first function so why is it not a sufficient statistic as well?

Best Answer

You'd probably benefit from reading about sufficiency in any textbook on theoretical statistics, where most of these questions will be covered in detail. Briefly ...

  1. Not necessarily. Those are special cases: of distributions where the support (the range of values the data can take) doesn't depend on the unknown parameter(s), only those in the exponential family have a sufficent statistic of the same dimensionality as the number of parameters. So for estimating the shape & scale of a Weibull distribution or the location & scale of a logistic distribution from independent observations, the order statistic (the whole set of observations disregarding their sequence) is minimal sufficient—you can't reduce it further without losing information about the parameters. Where the support does depend on the unknown parameter(s) it varies: for a uniform distribution on $(0,\theta)$, the sample maximum is sufficient for $\theta$; for a uniform distribution on $(\theta-1,\theta+1)$ the sample minimum and maximum are together sufficient.

  2. I don't know what you mean by "direct correspondence"; the alternative you give seems a fair way to describe sufficient statistics.

  3. Yes: trivially the data as a whole are sufficient. (If you hear someone say there's no sufficient statistic they mean there's no low-dimensional one.)

  4. Yes, that's the idea. (What's left—the distribution of the data conditional on the sufficient statistic—can be used for checking the distributional assumption independently of the unknown parameter(s).)

  5. Apparently not, though I gather the counter-examples are not distributions you're likely to want to use in practice. [It'd be nice if anyone could explain this without getting too heavily into measure theory.]

In response to the further questions ...

  1. The first factor, $ \mathrm{e}^{-n\lambda}\cdot\lambda^{\sum{x_i}}$, depends on $\lambda$ only through $\sum x_i$. So any one-to-one function of $\sum x_i$ is sufficient: $\sum x_i$, $\sum x_i/n$, $(\sum x_i)^2$, & so on.

  2. The second factor, $\tfrac{1}{x_1! x_2! \ldots x_n!}$, doesn't depend on $\lambda$ & so won't affect the value of $\lambda$ at which $f(x;\lambda)$ is a maximum. Derive the MLE & see for yourself.

  3. The sample size $n$ is a known constant rather than a realized value of a random variable, so isn't considered part of the sufficient statistic; the same goes for known parameters other than the ones you want to infer things about.

† In this case squaring is one-to-one because $\sum x_i$ is always positive.

‡ When $n$ is a realized value of the random variable $N$, then it will be part of the sufficient statistic, $(\sum x_i,n)$. Say you choose a sample size of 10 or 100 by tossing a coin: $n$ tells you nothing about the value of $\theta$ but does affect how precisely you can estimate it; in this case it's called an ancillary complement to $\sum x_i$ & inference can proceed by conditioning on its realized value—in effect ignoring that it might have come out different.

Related Question