Variance Estimation – Calculating Variance of the Reciprocal

estimationrandom variablevariance

Background

I've recently read the paper

Leo A. Goodman, On the Exact Variance of Products
Journal of the American Statistical Association
Vol. 55, No. 292 (Dec., 1960), pp. 708-713

from where I extract the following edited quotes (removed superfluous calculations and sentences)

Let $x$ and $y$ be two independent random variables. Let us denote the expected value of x by $E(x) = X$, the variance of $x$ by $V(x)$, … A similar notation will be used for the random variable $y$.

…we have that the variance $V(xy)$ of the product $xy$ is equal to
$$ V(xy) = \ldots = X^2V(y) + Y^2V(x) + V(x)V(y)$$

… We shall now present an unbiased estimate of the variance $V(xy)$.
… we have that
$$ v(xy) = \ldots = x^2v(y) + y^2(x) – v(x)v(y)$$

is an unbiased estimate of $V(xy)$, where $v(x)$ is an unbiased estimate of $V(x)$ and $v(y)$ is an unbiased estimate of $V(y)$.

I have a relatively simple formula $P = w + xy/(1-z)$ where each of these (independent!) variables have been estimated by a statistical package, and supplied along with 95% confidence limits and standard errors (hence variances). In fact, each of $w,x,y,z$ are probabilities, and $z$ is bounded away from 1. (as an example of the magnitudes involved, one instance of the problem has $0.1 \lt w,x,y,z \lt 0.6$ and all standard errors about $3 \times 10^{-3}$)

Questions

I need to estimate some confidence limits on $P$, and my first idea was to use the confidence limits of $w,x,y,z$, but it looks tricky/inadvisable. My second idea was to work out the variance of $P$. This clearly boils down to finding the variance for $xy/(1-z)$.

Someone has told me that I should use the equation for $v(xy)$ in the context of my formula. That is all well and good, I can accept that. So now all I need to do is find the variance of $1/(1-z)$ and apply the result of the Goodman paper twice, or perhaps only find the variance of $y/(1-z)$ and use the Goodman result once. For argument's sake, let's do the former.

I found on the internet a rough set of notes which estimated the variance of a ratio $x/y$ to be (taking the special case of $x,y$ independent)
$$
Var(x/y) \approx \frac{E(y)^2 Var(x) + E(x)^2 Var(y)}{E(y)^4}
$$
and for the case that I am interested in, I can take $x \sim Uniform(0,1)$ (i.e. '$1$') and so get
$$
Var(1/y) \approx \frac{Var(y)}{E(y)^4} \quad \quad (1)
$$
Is this reliable/right? Even if it is, I now am faced with a small conundrum. What is the analogue in this instance for the formula for $v$?

I am happy to take all answers that address my original problem, the question of approximating $Var(1/(1-z))$, whether I use $Var$ as given in the approximation (1) or some "unbiased estimate" in terms of the data I do have, and lastly, what would this "unbiased estimate" be, given (1)?

Best Answer

If you can't get a predictive accuracy out of the package, this may help.

1) A better approximation to $Var(x/y)$, which to some extent takes covariation into account, is:

$Var(x/y) \approx \left(\frac{E(x)}{E(y)}\right)^2 \left(\frac{Var(x)}{E(x)^2} + \frac{Var(y)}{E(y)^2} - 2 \frac{Cov(x,y)}{E(x)E(y)}\right)$

2) For approximating the variance of a transform of a random variate, the delta method Wikipedia sometimes, but not always, gives good results. In this case, it gives, corresponding to your formula (1):

$Var(1/(1-z)) \approx \frac{Var(z)}{(1-E(z))^4}$

So now you know where that comes from! Using more terms from the underlying Taylor expansion etc. gives a higher-order, although not necessarily better, approximation:

$Var(1/(1-z)) \approx \frac{Var(z)}{(1-E(z))^4} + 2\frac{E[(z-E(z))^3]}{(1-E(z))^5} + \frac{E[(z-E(z))^4]}{(1-E(z))^6}$

I tried this out via simulation using 10,000 $U(0.1,0.6)$ variates, mimicking the example range you provided in your question, and obtained the following results. The observed variance of $1/(1-z)$ was 0.149. The first-order delta approximation yielded a value of 0.117. The next delta approximation yielded a value of 0.128. 10,000 draws from a Beta(10,20) distribution gave results of similar relative accuracy; the observed variance of $1/(1-z)$ was 0.044 and the higher-order delta approximation gave a value of 0.039.

How you would get the third and fourth moments of your estimates I'm not sure. You could, if your sample sizes give you some confidence in being close to asymptotic normality for your estimates, just use those of the Normal distribution. A bootstrap is a possibility as well, if you can do it. Either way, with small samples you're probably better off with the one-term approximation.

Of course, I could simplify all this notation by just defining $z' = 1-z$ and using that, but I chose to stick with the original notation in the question.

Related Question