Variance Estimation – Calculating Variance of the Reciprocal

estimationrandom variablevariance

Background

I've recently read the paper

Leo A. Goodman, On the Exact Variance of Products
Journal of the American Statistical Association
Vol. 55, No. 292 (Dec., 1960), pp. 708-713

from where I extract the following edited quotes (removed superfluous calculations and sentences)

Let $x$ and $y$ be two independent random variables. Let us denote the expected value of x by $E(x) = X$, the variance of $x$ by $V(x)$, … A similar notation will be used for the random variable $y$.

…we have that the variance $V(xy)$ of the product $xy$ is equal to
$$ V(xy) = \ldots = X^2V(y) + Y^2V(x) + V(x)V(y)$$

… We shall now present an unbiased estimate of the variance $V(xy)$.
… we have that
$$ v(xy) = \ldots = x^2v(y) + y^2(x) – v(x)v(y)$$

is an unbiased estimate of $V(xy)$, where $v(x)$ is an unbiased estimate of $V(x)$ and $v(y)$ is an unbiased estimate of $V(y)$.

I have a relatively simple formula $P = w + xy/(1-z)$ where each of these (independent!) variables have been estimated by a statistical package, and supplied along with 95% confidence limits and standard errors (hence variances). In fact, each of $w,x,y,z$ are probabilities, and $z$ is bounded away from 1. (as an example of the magnitudes involved, one instance of the problem has $0.1 \lt w,x,y,z \lt 0.6$ and all standard errors about $3 \times 10^{-3}$)

Questions

I need to estimate some confidence limits on $P$, and my first idea was to use the confidence limits of $w,x,y,z$, but it looks tricky/inadvisable. My second idea was to work out the variance of $P$. This clearly boils down to finding the variance for $xy/(1-z)$.

Someone has told me that I should use the equation for $v(xy)$ in the context of my formula. That is all well and good, I can accept that. So now all I need to do is find the variance of $1/(1-z)$ and apply the result of the Goodman paper twice, or perhaps only find the variance of $y/(1-z)$ and use the Goodman result once. For argument's sake, let's do the former.

I found on the internet a rough set of notes which estimated the variance of a ratio $x/y$ to be (taking the special case of $x,y$ independent)
$$
Var(x/y) \approx \frac{E(y)^2 Var(x) + E(x)^2 Var(y)}{E(y)^4}
$$
and for the case that I am interested in, I can take $x \sim Uniform(0,1)$ (i.e. '$1$') and so get
$$
Var(1/y) \approx \frac{Var(y)}{E(y)^4} \quad \quad (1)
$$
Is this reliable/right? Even if it is, I now am faced with a small conundrum. What is the analogue in this instance for the formula for $v$?

I am happy to take all answers that address my original problem, the question of approximating $Var(1/(1-z))$, whether I use $Var$ as given in the approximation (1) or some "unbiased estimate" in terms of the data I do have, and lastly, what would this "unbiased estimate" be, given (1)?

Best Answer

If you can't get a predictive accuracy out of the package, this may help.

1) A better approximation to $Var(x/y)$, which to some extent takes covariation into account, is:

$Var(x/y) \approx \left(\frac{E(x)}{E(y)}\right)^2 \left(\frac{Var(x)}{E(x)^2} + \frac{Var(y)}{E(y)^2} - 2 \frac{Cov(x,y)}{E(x)E(y)}\right)$

2) For approximating the variance of a transform of a random variate, the delta method Wikipedia sometimes, but not always, gives good results. In this case, it gives, corresponding to your formula (1):

$Var(1/(1-z)) \approx \frac{Var(z)}{(1-E(z))^4}$

So now you know where that comes from! Using more terms from the underlying Taylor expansion etc. gives a higher-order, although not necessarily better, approximation:

$Var(1/(1-z)) \approx \frac{Var(z)}{(1-E(z))^4} + 2\frac{E[(z-E(z))^3]}{(1-E(z))^5} + \frac{E[(z-E(z))^4]}{(1-E(z))^6}$

I tried this out via simulation using 10,000 $U(0.1,0.6)$ variates, mimicking the example range you provided in your question, and obtained the following results. The observed variance of $1/(1-z)$ was 0.149. The first-order delta approximation yielded a value of 0.117. The next delta approximation yielded a value of 0.128. 10,000 draws from a Beta(10,20) distribution gave results of similar relative accuracy; the observed variance of $1/(1-z)$ was 0.044 and the higher-order delta approximation gave a value of 0.039.

How you would get the third and fourth moments of your estimates I'm not sure. You could, if your sample sizes give you some confidence in being close to asymptotic normality for your estimates, just use those of the Normal distribution. A bootstrap is a possibility as well, if you can do it. Either way, with small samples you're probably better off with the one-term approximation.

Of course, I could simplify all this notation by just defining $z' = 1-z$ and using that, but I chose to stick with the original notation in the question.

Related Solutions

Solved – Variance of sample Variance

Note that $S^2$ has terms involving $X_i^4$, and so $E[S^2]$ is the sum of terms involving $E[X_i^4]$. Thus, if the fourth moment is not finite, neither is $E[S^2]$ finite, nor is var$(S^2)$ finite. Some people say that various quantities such as expectations, variances, etc must be said to be undefined when the corresponding integrals/sums diverge. Others reserve the term "undefined" for cases when the integrals/sums lead to indeterminate forms such as $\infty - \infty$. The latter group would say that for a Cauchy random variable $X$, $E[X]$ is undefined, $E[X^2]$ is defined (but unbounded), and var$(X)$ is undefined (since $E[X]$ is undefined and so var$(X) = E[X^2] - (E[X])^2$ makes no sense). The former group would say that $E[X]$, $E[X^2]$, and var$(X)$ are all undefined for a Cauchy random variable.

Variance Estimation – Methods for Estimating Variance Given the Mean

Suppose you have a random sample of size $n$ from the population $\mathsf{Norm}(\mu, \sigma),$ where $\sigma$ is not known and $\mu$ is known.

Let $V = \frac 1n\sum_{i=1}^n (X_i - \mu)^2.$

Then $V$ is a better estimate of the population variance $\sigma^2$ than is $S^2=\frac{1}{n-1}\sum_{i=1}^n (X_i - \bar X)^2,$ where $\bar X =\frac 1 n \sum_{i=1}^n X_i.$

Also, a 95% CI for $\sigma^2$ tends to be narrower if we use $V$ than if we use $S^2.$ [Samples can vary, so this CI is not always narrower.]

In particular, a 95% CI for $\sigma^2$ is based on the relationship $\frac{nV}{\sigma^2} \sim \mathsf{Chisq}(\nu = n).$

Example: Suppose I have the sample x of size $n = 50$ from $\mathsf{Norm}(\mu = 20, \sigma = 3),$ where I assume $\mu$ is known and $\sigma$ is not.

set.seed(215)
x = rnorm(50, 20, 3)
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  14.21   17.97   19.94   20.30   22.62   29.31 

v = (sum((x-20)^2))/50;  v
[1] 10.69335

CI.1 = 50*v/qchisq(c(.975,.025), 50);  CI.1
[1]  7.486223 16.523827
diff(CI.1)
[1] 9.037604    # width of CI

The formula for this confidence interval is $\left(\frac{50V}{U}, \frac{50V}{L}\right),$ where $L$ and $U$ cut probabilities $0.025$ from the lower and upper tails, respectively, of $\mathsf{Chisq}(\nu=50).$ For the data of my example, the CI is $(7.49\, 16.52)$ of width $9.04.$

By contrast, the 95% CI for $\sigma^2$ based on $S^2,$ where $\mu$ is estimated by $\bar X,$ uses the relationship $\frac{(n-1)S^2}{\sigma^2}\sim\mathsf{Chisq}(\nu=49).$

CI.2 = 49*var(x)/qchisq(c(.975,.025), 49);  CI.2
[1]  7.548087 16.797538
diff(CI.2)
[1] 9.249451   # wider CI

For the data of my example, the CI is $(7.55,\, 16.80)$ of width $9.25 > 9.04.$

Best Answer

Related Solutions

Solved – Variance of sample Variance

Variance Estimation – Methods for Estimating Variance Given the Mean

Related Question