Solved – How are estimators like the Horvitz-Thompson Estimator derived

decision-theoryestimationmathematical-statisticssampling

The Horvitz-Thompson Estimator is usually given by:

$$
\hat{Y}_{HT} = \sum_{i=1}^n \pi_i ^{-1} Y_i
$$

The proof that it is unbiased is trivial to do. In additional, there exists other estimators out there for different designs as well, like those in Rubin and Rosenbaum (1983). However, in each of the original papers, the estimator seemed to appear out of nowhere with no motivation, only appearing so that the author could show it was unbiased.

My question is, is there a solid way to come up with unbiased estimators like there?

Best Answer

the estimator seemed to appear out of nowhere with no motivation

If you think the idea behind stratified sampling is intuitive, then I believe Horvitz-Thompson should come as a natural extension, it's not something out of the blue.

To illustrate how a simple stratified sample could help you come up with the formula, consider a case with two strata, $S_1$ and $S_2$ of known sizes $N_1$ and $N_2$ and suppose you get samples $n_1$ and $n_2$ respectively. Now imagine you compute the average for each sample $\bar{y}_1$ and $\bar{y}_2$.

How would you use this information to estimate the total $Y$? The natural way is to take the estimated average of each stratum and multiply by the total (population) number of elements of the stratum:

$$ \hat{Y} = N_1 \bar{y}_1 + N_2 \bar{y}_2 $$

But take this simple expression and rewrite it as:

$$ \begin{align} \hat{Y} &= N_1 \sum_{i \in S_1}\frac{y_i}{n_1} + N_2 \sum_{i \in S_2}\frac{y_i}{n_2}\\ &= \sum_{i \in S_1}\frac{y_i}{n_1/N_1} + \sum_{i \in S_2}\frac{y_i}{n_2/N_2}\\ &= \sum_{i} \frac{y_i}{\pi_i} \end{align} $$

Where $\pi_i = n_1/N_1$ if $i\in S_1$ and $\pi_i = n_2/N_2$ if $i\in S_2$. That is, a simple stratified sampling already gives you the insight that, in essence, what we are doing is summing each sampled $y_i$ upweighted by its probability of selection. Then the idea of a general inverse probability weighting, where each $\pi_i$ could be different, should come naturally.

Related Question