Mathematical Statistics – Examples of Statistics Dependent on Sample Distribution

definitionmathematical-statistics

This is the definition for statistic on wikipedia

More formally, statistical theory defines a statistic as a function of a sample where the function itself is independent of the sample's distribution; that is, the function can be stated before realization of the data. The term statistic is used both for the function and for the value of the function on a given sample.

I think I understand most of this definition, however the part – where the function is independent of the sample's distribution I haven't been able to sort out.

My understanding of statistic so far

A sample is a set of realizations of some number of independent, identically distributed (iid) random variables with distribution F (10 realizations of a roll of a 20-sided fair dice, 100 realizations of 5 rolls of a 6-sided fair dice, randomly draw 100 people from a population).

A function, whose domain is that set, and whose range is the real numbers (or maybe it can produce other things, like a vector or other mathematical object…) would be considered a statistic.

When I think of examples, mean, median, variance all make sense in this context. They are a function on set of realizations (blood pressure measurements from a random sample). I can also see how a linear regression model could be considered a statistic $y_{i} = \alpha + \beta \cdot x_{i}$ – is this not just a function on a set of realizations?

Where I'm confused

Assuming that my understanding from above is correct, I haven't been able to understand where a function might not be independent of the sample's distribution. I've been trying to think of an example to make sense of it, but no luck. Any insight would be much appreciated!

Best Answer

That definition is a somewhat awkward way to state it. A "statistic" is any function of the observable values. All that definition means is that a statistic is a function only of the observable values, not a function of the distribution or any of its parameters. For example, if $X_1, X_2, ..., X_n \sim \text{N}(\mu, 1)$ then a statistic would be any function $T(X_1,...,X_n)$ whereas a function $H(X_1,....,X_n, \mu)$ would not be a statistic, since it depends on $\mu$. Here are some further examples:

$$\begin{equation} \begin{aligned} \text{Statistic} & & & & & \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i, \\[12pt] \text{Statistic} & & & & & S_n^2 = \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X}_n)^2, \\[12pt] \text{Not a statistic} & & & & & D_n = \bar{X}_n - \mu, \\[12pt] \text{Not a statistic} & & & & & p_i = \text{N}(x_i | \mu, 1), \\[12pt] \text{Not a statistic} & & & & & Q = 10 \mu. \\[12pt] \end{aligned} \end{equation}$$

Every statistic is a function only of the observable values, and not of their distribution or its parameters. So there are no examples of a statistic that is a function of the distribution or its parameters (any such function would not be a statistic). However, it is important to note that the distribution of a statistic (as opposed to the statistic itself) will generally depend on the underlying distribution of the values. (This is true for all statistics other than ancillary statistics.)


What about a function where the parameters are known? In the comments below, Alecos asks an excellent follow-up question. What about a function that uses a fixed hypothesised value of the parameter? For example, what about the statistic $\sqrt{n} (\bar{x} - \mu)$ where $\mu = \mu_0$ is taken to be equal to a known hypothesised value $\mu_0 \in \mathbb{R}$. Here the function is indeed a statistic, so long as it is defined on the appropriately restricted domain. So the function $H_0: \mathbb{R}^n \rightarrow \mathbb{R}$ with $H_0(x_1,...,x_n) = \sqrt{n} (\bar{x} - \mu_0)$ would be a statistic, but the function $H: \mathbb{R}^{n+1} \rightarrow \mathbb{R}$ with $H(x_1,...,x_n, \mu) = \sqrt{n} (\bar{x} - \mu)$ would not be a statistic.

Related Question