Solved – Bias of estimating a Ratio

mathematical-statisticsratio

I would like to know why 1/N * Summation of(y/x) is a worse estimator than average of y divided by average of x.

This is in the context of constructing an estimator of a ratio where the ratio would be something like cost per person etc.

I'm drawing from sample data as well.

Best Answer

I would like to know why $\frac{1}{N} \sum_i\frac{y_i}{x_i}$ is a worse estimator than average of y divided by average of x.

Estimator of what? If you want to estimate $E\left(\frac{y}{x}\right)$, then what you have proposed, $\overline{\;\frac{y}{x}}=\frac{1}{N} \sum_i\frac{y_i}{x_i}$, the sample mean of $\frac{y_i}{x_i}$ is often a dandy estimator (unbiased, consistent, etc). On the other hand, if you want to estimate $\frac{E(y)}{E(x)}$, then it would be better to go with $\frac{\overline{y}}{\overline{x}}$. Under pretty weak assumptions, it's consistent, at least, though not likely unbiased.

The real question is, what do you want to estimate? Suppose $y$ is dollars spent on food and $x$ is dollars in income and $i$ is a family. Then, the ratio $\frac{y_i}{x_i}$ is the proportion of family $i$'s income spent on food. The parameter $E(\frac{y}{x})$ is the average proportion of income spent on food over families. The parameter $\frac{E(y)}{E(x)}$ is the proportion of aggregate income spent on food. There is no reason in the world for these two things to be the same. Here is an example population:

\begin{align} \begin{array}{r r r} \text{family} & \text{Income} & \text{Food} & \text{ratio}\\ 1 & 100000 & 20000 & 0.2\\ 2 & 10000 & 8000 & 0.8\\ 3 & 10000 & 8000 & 0.8 \end{array} \end{align}

So, the average ratio is 0.6 ($\frac{0.2+0.8+0.8}{3})$), while the ratio of the averages is 0.3 ($\frac{20000+8000+8000}{100000+10000+10000}$). Neither one of these is right. Neither one of these is wrong. They are just estimating different things. The aggregate ratio between food spending and income is 0.3. The average family, on the other hand, has a ratio between food spending and income of 0.6.

One way to think about it is that the average of the ratios weights each family's ratio the same in computing the mean, and that the ratio of the averages weights the rich family's ratio more. Watch:

\begin{align} \frac{\overline{y}}{\overline{x}} &= \frac{\sum y_i}{\sum x_i}\\ &= \sum_i\frac{1}{\sum x_i}y_i\\ &= \sum_i\frac{x_i}{\sum x_i}\frac{y_i}{x_i}\\ & \\ \overline{\;\frac{y}{x}} &= \sum_i \frac{1}{N}\frac{y_i}{x_i} \end{align}

In the ratio of means, family $i$'s ratio "counts" $\frac{x_i}{\sum x_i}$ in the overall mean --- it counts in proportion to its income. In the mean of the ratios, each family's ratio counts the same, $\frac{1}{N}$.

Which one of these you want depends on what you are using it for. If I asked you a question like "If I give every family \$1 extra, how much extra will be spent on food," how might you approach the problem? Well, you might decide to assume that each family's ratio will stay the same after the experiment (this assumption will drive economists crazy, conflating as it does marginal and average, but that just adds to the fun). Then, the increase in food spending from this experiment will be $N \cdot E\left(\frac{y}{x}\right)$. On the other hand, if I said that I was going to give away $N$ dollars to these families in ratio to their current income (more to rich families, less to poor), then the exact same reasoning would lead you to expect a $N \cdot \frac{E(y)}{E(x)}$ dollar increase in spending on food.

The usual reason people give for liking the ratio of averages is that it allows you to do some kinds of arithmetic more easily. So, for example, suppose I say, "There is a population of 50 families with average income equal to \$56,000. What will their total food spending be?" If you have the ratio of averages calculated, then you can answer something like "Assuming the distribution of income in your population of families is the same as the distribution of income in the sample I have, then total spending on food should be about $50\cdot\frac{\overline{y}}{\overline{x}}\cdot\$56,000$.

Related Question