Solved – Is any quantitative property of the population a “parameter”

estimationpopulationsampleterminology

I'm relatively familiar with the distinction between the terms statistic and parameter. I see a statistic as the value obtained from applying a function to the sample data. However, most examples of parameters relate to defining a parametric distribution. A common example is the mean and standard deviation to parameterise the normal distribution or the coefficients and error variance to parameterise a linear regression.

However, there are many other values of the population distribution that are less prototypical (e.g., minimum, maximum, r-square in multiple regression, the .25 quantile, median, the number of predictors with non-zero coefficients, skewness, the number of correlations in a correlation matrix greater than .3, etc.).

Thus, my questions are:

Should any quantitative property of a population be labelled a "parameter"?
If yes, then why?
If no, what characteristics should not be labelled a parameter? What should they be labelled? And why?

Elaboration on confusion

The Wikipedia article on estimators states:

An "estimator" or "point estimate" is a statistic (that is, a function
of the data) that is used to infer the value of an unknown parameter
in a statistical model.

But I can define the unknown value as .25 quantile and I can develop an estimator for that unknown. I.e., not all quantitative properties of a population are parameters in the same way that say the mean and sd are parameters of a normal distribution, yet it is legitimate to seek to estimate any quantitative population property.

Best Answer

This question goes to the heart of what statistics is and how to to conduct a good statistical analysis. It raises many issues, some of terminology and others of theory. To clarify them, let's begin by noting the implicit context of the question and go on from there to define the key terms "parameter," "property," and "estimator." The several parts of the question are answered as they come up in the discussion. The final concluding section summarizes the key ideas.

State spaces

A common statistical use of "the distribution," as in "the Normal distribution with PDF proportional to $\exp(-\frac{1}{2}(x-\mu)/\sigma)^2)dx$" is actually a (serious) abuse of English, because obviously this is not one distribution: it's a whole family of distributions parameterized by the symbols $\mu$ and $\sigma$. A standard notation for this is the "state space" $\Omega$, a set of distributions. (I am simplifying a bit here for the sake of exposition and will continue to simplify as we go along, while remaining as rigorous as possible.) Its role is to delineate the possible targets of our statistical procedures: when we estimate something, we are picking out one (or sometimes more) elements of $\Omega$.

Sometimes state spaces are explicitly parameterized, as in $\Omega = \{\mathcal{N}(\mu, \sigma^2)|\mu \in \mathbb{R}, \sigma \gt 0\}$. In this description there is a one-to-one correspondence between the set of tuples $\{(\mu,\sigma)\}$ in the upper half plane and the set of distributions we will be using to model our data. One value of such a parameterization is that we may now refer concretely to distributions in $\Omega$ by means of an ordered pair of real numbers.

In other cases state spaces are not explicitly parameterized. An example would be the set of all unimodal continuous distributions. Below, we will address the question of whether an adequate parameterization can be found in such cases anyway.

Parameterizations

Generally, a parameterization of $\Omega$ is a correspondence (mathematical function) from a subset of $\mathbb{R}^d$ (with $d$ finite) to $\Omega$. That is, it uses ordered sets of $d$-tuples to label the distributions. But it's not just any correspondence: it has to be "well behaved." To understand this, consider the set of all continuous distributions whose PDFs have finite expectations. This would widely be regarded as "non-parametric" in the sense that any "natural" attempt to parameterize this set would involve a countable sequence of real numbers (using an expansion in any orthogonal basis). Nevertheless, because this set has cardinality $\aleph_1$, which is the cardinality of the reals, there must exist some one-to-one correspondence between these distributions and $\mathbb{R}$. Paradoxically, that would seem to make this a parameterized state space with a single real parameter!

The paradox is resolved by noting that a single real number cannot enjoy a "nice" relationship with the distributions: when we change the value of that number, the distribution it corresponds to must in some cases change in radical ways. We rule out such "pathological" parameterizations by requiring that distributions corresponding to close values of their parameters must themselves be "close" to one another. Discussing suitable definitions of "close" would take us too far afield, but I hope this description is enough to demonstrate that there is much more to being a parameter than just naming a particular distribution.

Properties of distributions

Through repeated application, we become accustomed to thinking of a "property" of a distribution as some intelligible quantity that frequently appears in our work, such as its expectation, variance, and so on. The problem with this as a possible definition of "property" is that it's too vague and not sufficiently general. (This is where mathematics was in the mid-18th century, where "functions" were thought of as finite processes applied to objects.) Instead, about the only sensible definition of "property" that will always work is to think of a property as being a number that is uniquely assigned to every distribution in $\Omega$. This includes the mean, the variance, any moment, any algebraic combination of moments, any quantile, and plenty more, including things that cannot even be computed. However, it does not include things that would make no sense for some of the elements of $\Omega$. For instance, if $\Omega$ consists of all Student t distributions, then the mean is not a valid property for $\Omega$ (because $t_1$ has no mean). This impresses on us once again how much our ideas depend on what $\Omega$ really consists of.

Properties are not always parameters

A property can be such a complicated function that it would not serve as a parameter. Consider the case of the "Normal distribution." We might want to know whether the true distribution's mean, when rounded to the nearest integer, is even. That's a property. But it will not serve as a parameter.

Parameters are not necessarily properties

When parameters and distributions are in one-to-one correspondence then obviously any parameter, and any function of the parameters for that matter, is a property according to our definition. But there need not be a one-to-one correspondence between parameters and distributions: sometimes a few distributions must be described by two or more distinctly different values of the parameters. For instance, a location parameter for points on the sphere would naturally use latitude and longitude. That's fine--except at the two poles, which correspond to a given latitude and any valid longitude. The location (point on the sphere) indeed is a property but its longitude is not necessarily a property. Although there are various dodges (just declare the longitude of a pole to be zero, for instance), this issue highlights the important conceptual difference between a property (which is uniquely associated with a distribution) and a parameter (which is a way of labeling the distribution and might not be unique).

Statistical procedures

The target of an estimate is called an estimand. It is merely a property. The statistician is not free to select the estimand: that is the province of her client. When someone comes to you with a sample of a population and asks for you to estimate the population's 99th percentile, you would likely be remiss in supplying an estimator of the mean instead! Your job, as statistician, is to identify a good procedure for estimating the estimand you have been given. (Sometimes your job is to persuade your client that he has selected the wrong estimand for his scientific objectives, but that's a different issue...)

By definition, a procedure is a way to get a number out of the data. Procedures are usually given as formulas to be applied to the data, like "add them all up and divide by their count." Literally any procedure may be pronounced an "estimator" of a given estimand. For instance, I could declare that the sample mean (a formula applied to the data) estimates the population variance (a property of the population, assuming our client has restricted the set of possible populations $\Omega$ to include only those that actually have variances).

Estimators

An estimator needn't have any obvious connection to the estimand. For instance, do you see any connection between the sample mean and a population variance? Neither do I. But nevertheless, the sample mean actually is a decent estimator of the population variance for certain $\Omega$ (such as the set of all Poisson distributions). Herein lies one key to understanding estimators: their qualities depend on the set of possible states $\Omega$. But that's only part of it.

A competent statistician will want to know how well the procedure they are recommending will actually perform. Let's call the procedure "$t$" and let the estimand be $\theta$. Not knowing which distribution actually is the true one, she will contemplate the procedure's performance for every possible distribution $F \in \Omega$. Given such an $F$, and given any possible outcome $s$ (that is, a set of data), she will compare $t(s)$ (what her procedure estimates) to $\theta(F)$ (the value of the estimand for $F$). It is her client's responsibility to tell her how close or far apart those two are. (This is often done with a "loss" function.) She can then contemplate the expectation of the distance between $t(s)$ and $\theta(F)$. This is the risk of her procedure. Because it depends on $F$, the risk is a function defined on $\Omega$.

(Good) statisticians recommend procedures based on comparing risk. For instance, suppose that for every $F \in \Omega$, the risk of procedure $t_1$ is less than or equal to the risk of $t$. Then there is no reason ever to use $t$: it is "inadmissible." Otherwise it is "admissible".

(A "Bayesian" statistician will always compare risks by averaging over a "prior" distribution of possible states (usually supplied by the client). A "Frequentist" statistician might do this, if such a prior justifiably exists, but is also willing to compare risks in other ways Bayesians eschew.)

Conclusions

We have a right to say that any $t$ that is admissible for $\theta$ is an estimator of $\theta$. We must, for practical purposes (because admissible procedures can be hard to find), bend this to saying that any $t$ that has acceptably small risk (when being compared to $\theta$) among practicable procedures is an estimator of $\theta$. "Acceptably" and "practicable" are determined by the client, of course: "acceptably" refers to their risk and "practicable" reflects the cost (ultimately paid by them) of implementing the procedure.

Underlying this concise definition are all the ideas just discussed: to understand it we must have in mind a specific $\Omega$ (which is a model of the problem, process, or population under study), a definite estimand (supplied by the client), a specific loss function (which quantitatively connects $t$ to the estimand and is also given by the client), the idea of risk (computed by the statistician), some procedure for comparing risk functions (the responsibility of the statistician in consultation with the client), and a sense of what procedures actually can be carried out (the "practicability" issue), even though none of these are explicitly mentioned in the definition.