[Math] Variance stabilization for Poisson data

random variablesstatisticstransformation

Intro
Let $Z > 0$ be a random variable with the mean and variance defined as $\mathbb{E}\{ Z \}$ and $\operatorname{Var}\{ Z \}$, respectively. The variance stabilization transform (VST) $f(z)$ turns heteroskedastic data $z$ to homoskedastic data $f(z)$ with constant variance, e.g., variance equals 1.

Poisson distribution
For Poisson distributed data with $\mathbb{E}\{ Z \} = \operatorname{Var}\{ Z \}=\lambda$ this VST, so-called Anscombe transformation, is given by [1,2]:

$$f(z) = 2\sqrt{z + 3/8}$$

Based on the first order Taylor expansion we can write (this is called Delta method in the literature) [3]:

$$\operatorname{Var}\{f(z)\} \approx \left( \left.\frac{df}{dz}\right|_{z=\mathbb{E}\{ Z \}} \right)^2 \operatorname{Var}\{Z\} = \frac{\operatorname{Var}\{Z\}}{\mathbb{E}\{ Z \} + 3/8}$$

Problem
I performed a Monte Carlo simulations to compare sample variance of the stabilized data $f(z)$ and the variance obtained by the above equation, i.e., $\operatorname{Var}\{f(z)\}$ both numerically as a sample variance and theoretically as follows:

$$\operatorname{Var}\{f(z)\} \approx \frac{\lambda}{\lambda + 3/8}$$

Moreover, I went further and derived second order approximation for $Var\{f(z)\}$.

There is a mismatch between variance of stabilized data $f(z)$ (green curve) and those obtained theoretically and numerically by means of the $\operatorname{Var}\{f(z)\}$. Can anyone explain me this inconsistency?

enter image description here

  1. Anscombe, F. J. (1948), "The transformation of Poisson, binomial and negative-binomial data", Biometrika 35 (3–4): 246–254, doi:10.1093/biomet/35.3-4.246, JSTOR 2332343
  2. http://en.wikipedia.org/wiki/Anscombe_transform
  3. Kendall's Advanced Theory of Statistics: Volume 1: Distribution Theory by Alan Stuart and Keith Ord (Apr 20, 2009), page. 351, eq. 10.14

Best Answer

Nice post!

I think, the problem with delta method in general, you are linearizing $f$. This is not a good approximation if $f$ is not very linear. Which happens for small $\lambda$. Considering higher-order terms in the Taylor series should help it, but then the variance calculation is not so clear-cut.

To draw a parallel from Statistics, this would work perfectly if the underlying distribution were Normal. The more non-Normal it is (that is, for smaller $\lambda$), the more mismatch you'll see. If you keep increasing $\lambda$, all the curves will merge.

Related Question