What is the relation between estimator and estimate?
Solved – the relation between estimator and estimate
estimationestimatorsterminology
Related Solutions
To define the two terms without using too much technical language:
An estimator is consistent if, as the sample size increases, the estimates (produced by the estimator) "converge" to the true value of the parameter being estimated. To be slightly more precise - consistency means that, as the sample size increases, the sampling distribution of the estimator becomes increasingly concentrated at the true parameter value.
An estimator is unbiased if, on average, it hits the true parameter value. That is, the mean of the sampling distribution of the estimator is equal to the true parameter value.
The two are not equivalent: Unbiasedness is a statement about the expected value of the sampling distribution of the estimator. Consistency is a statement about "where the sampling distribution of the estimator is going" as the sample size increases.
It certainly is possible for one condition to be satisfied but not the other - I will give two examples. For both examples consider a sample $X_1, ..., X_n$ from a $N(\mu, \sigma^2)$ population.
Unbiased but not consistent: Suppose you're estimating $\mu$. Then $X_1$ is an unbiased estimator of $\mu$ since $E(X_1) = \mu$. But, $X_1$ is not consistent since its distribution does not become more concentrated around $\mu$ as the sample size increases - it's always $N(\mu, \sigma^2)$!
Consistent but not unbiased: Suppose you're estimating $\sigma^2$. The maximum likelihood estimator is $$ \hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^{n} (X_i - \overline{X})^2 $$ where $\overline{X}$ is the sample mean. It is a fact that $$ E(\hat{\sigma}^2) = \frac{n-1}{n} \sigma^2 $$ which can be derived using the information here. Therefore $\hat{\sigma}^2$ is biased for any finite sample size. We can also easily derive that $${\rm var}(\hat{\sigma}^2) = \frac{ 2\sigma^4(n-1)}{n^2}$$ From these facts we can informally see that the distribution of $\hat{\sigma}^2$ is becoming more and more concentrated at $\sigma^2$ as the sample size increases since the mean is converging to $\sigma^2$ and the variance is converging to $0$. (Note: This does constitute a proof of consistency, using the same argument as the one used in the answer here)
Both are unbiased, but unbiasedness is a statistical property, a result on a sample of estimates, that is if you have several data sets, the mean of the various estimates you did obtain would be the 'true'one.
In practice, one rarely has several data sets for the same model. Therefore we are also interested in the variance of the estimates, so that a particular value of the estimate, the one we will obtain ou our dataset, will be close to the 'true' value.
Therefore, the smaller variance of the estimate the better. Here, try to show that OLS estimator has a smaller variance than this one.
Best Answer
E. L. Lehmann, in his classic Theory of Point Estimation, answers this question on pp 1-2.
In words: an estimator is a definite mathematical procedure that comes up with a number (the estimate) for any possible set of data that a particular problem could produce. That number is intended to represent some definite numerical property ($g(\theta)$) of the data-generation process; we might call this the "estimand."
The estimator itself is not a random variable: it's just a mathematical function. However, the estimate it produces is based on data which themselves are modeled as random variables. This makes the estimate (thought of as depending on the data) into a random variable and a particular estimate for a particular set of data becomes a realization of that random variable.
In one (conventional) ordinary least squares formulation, the data consist of ordered pairs $(x_i, y_i)$. The $x_i$ have been determined by the experimenter (they can be amounts of a drug administered, for example). Each $y_i$ (a response to the drug, for instance) is assumed to come from a probability distribution that is Normal but with unknown mean $\mu_i$ and common variance $\sigma^2$. Furthermore, it is assumed that the means are related to the $x_i$ via a formula $\mu_i = \beta_0 + \beta_1 x_i$. These three parameters--$\sigma$, $\beta_0$, and $\beta_1$--determine the underlying distribution of $y_i$ for any value of $x_i$. Therefore any property of that distribution can be thought of as a function of $(\sigma, \beta_0, \beta_1)$. Examples of such properties are the intercept $\beta_0$, the slope $\beta_1$, the value of $\cos(\sigma + \beta_0^2 - \beta_1)$, or even the mean at the value $x=2$, which (according to this formulation) must be $\beta_0 + 2 \beta_1$.
In this OLS context, a non-example of an estimator would be a procedure to guess at the value of $y$ if $x$ were set equal to 2. This is not an estimator because this value of $y$ is random (in a way completely separate from the randomness of the data): it is not a (definite numerical) property of the distribution, even though it is related to that distribution. (As we just saw, though, the expectation of $y$ for $x=2$, equal to $\beta_0 + 2 \beta_1$, can be estimated.)
In Lehmann's formulation, almost any formula can be an estimator of almost any property. There is no inherent mathematical link between an estimator and an estimand. However, we can assess--in advance--the chance that an estimator will be reasonably close to the quantity it is intended to estimate. Ways to do this, and how to exploit them, are the subject of estimation theory.