Solved – What distribution does the inverse normal CDF of a beta random variable follow

beta distributionmathematical-statisticsnormal distributionr

Suppose you define:

$$X\sim\mbox{Beta}(\alpha,\beta)$$

$$Y\sim \Phi^{-1}(X)$$

where $\Phi^{-1}$ is the inverse of the CDF of the standard normal distribution.

My question is: Is there a simple distribution that $Y$ follows, or that can approximate $Y$? I'm asking because I have a strong suspicion based on simulation results (shown below) that $Y$ converges to a normal distribution when $\alpha$ and $\beta$ are high, but I don't know why it would mathematically. (Of course when $\alpha=1;\beta=1$, $X$ would be uniform and $Y$ would be the standard normal, but why would it be true for higher values?).

If this does converge to a normal, what would the parameters of that normal be, in terms of $\alpha$ and $\beta$? (I expect the mean would be $\Phi^{-1}(\frac{\alpha}{\alpha+\beta})$ since that's the transformation of the mode, but I don't know the standard deviation).

(Put another way, this could be asking "does $\Phi(\mbox{Norm}(\mu, \sigma))$ converge to a beta distribution, for some direction of $\mu$ and $\sigma$"? I'm not sure whether that's easier to answer).

Simulation results

Here I show why I have the suspicion that the result is normal (since I can't back it up with math). Simulation of $Y$ can be done in R with qnorm and rnorm. For example, choosing the high parameters $\alpha=3000$ and $\beta=7000$:

hist(qnorm(rbeta(5000, 3000, 7000)))

This does look normal, and qqnorm and the Shapiro-Wilk test (in which normality is the null hypothesis) suggest so as well:

qqnorm(qnorm(rbeta(5000, 3000, 7000)))

shapiro.test(qnorm(rbeta(5000, 3000, 7000)))
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  qnorm(rbeta(5000, 3000, 7000))
#> W = 0.99954, p-value = 0.2838

To explore the normality a bit deeper, I perform 2,000 simulations, each time simulating 5,000 values from $Y$, then performing the test to compare it to the normal. (I chose 5K values because that's the maximum shapiro.test can handle, and maximizes the power to detect deviations from the norm).

If the distribution truly were normal we would expect p-values would be uniform (since the null is true). They are indeed close to uniform, suggesting that the distribution is very close to normal:

hist(replicate(2000, shapiro.test(qnorm(rbeta(5000, 3000, 7000)))$p.value))

Some experimentation shows that the higher $\alpha$ and $\beta$ are, the close the distribution gets to normal (e.g. rbeta(5000, 3, 7) is quite far from normal, but try hist(replicate(2000, shapiro.test(qnorm(rbeta(5000, 30, 70)))$p.value)) and it appears to be somewhere in between).

Best Answer

Synopsis

You have rediscovered part of the construction described at Central Limit Theorem for Sample Medians, which illustrates an analysis of the median of a sample. (The analysis obviously applies, mutatis mutandis, to any quantile, not just the median). Therefore it is no surprise that for large Beta parameters (corresponding to large samples) a Normal distribution arises under the transformation described in the question. What is of interest is how close to Normal the distribution is even for small Beta parameters. That deserves an explanation.

I will sketch an analysis below. To keep this post at a reasonable length, it involves a lot of suggestive hand-waving: I aim only to point out the key ideas. Let me therefore summarize the results here:

  1. When $\alpha$ is close to $\beta$, everything is symmetric. This causes the transformed distribution already to look Normal.

  2. The functions of the form $\Phi^{\alpha-1}(x)\left(1-\Phi(x)\right)^{\beta-1}$ look fairly Normal in the first place, even for small values of $\alpha$ and $\beta$ (provided both exceed $1$ and their ratio is not too close to $0$ or $1$).

  3. The apparent Normality of the transformed distribution is due to the fact that its density consists of a Normal density multiplied by a function in (2).

  4. As $\alpha$ and $\beta$ increase, the departure from Normality can be measured in the remainder terms in a Taylor series for the log density. The term of order $n$ decreases in proportion to the $(n-2)/2$ powers of $\alpha$ and $\beta$. This implies that eventually, for sufficiently large $\alpha$ and $\beta$, all terms of power $n=3$ or greater have become relatively small, leaving only a quadratic: which is precisely the log density of a Normal distribution.

Collectively, these behaviors nicely explain why even for small $\alpha$ and $\beta$ the non-extreme quantiles of an iid Normal sample look approximately Normal.


Analysis

Because it can be useful to generalize, let $F$ be any distribution function, although we have in mind $F=\Phi$.

The density function $g(y)$ of a Beta$(\alpha,\beta)$ variable is, by definition, proportional to

$$y^{\alpha-1}(1-y)^{\beta-1}dy.$$

Letting $y=F(x)$ be the probability integral transform of $x$ and writing $f$ for the derivative of $F$, it is immediate that $x$ has a density proportional to

$$G(x;\alpha,\beta)=F(x)^{\alpha-1}(1-F(x))^{\beta-1}f(x)dx.$$

Because this is a monotonic transformation of a strongly unimodal distribution (a Beta), unless $F$ is rather strange, the transformed distribution will be unimodal, too. To study how close to Normal it might be, let's examine the logarithm of its density,

$$\log G(x;\alpha,\beta) = (\alpha-1)\log F(x) + (\beta-1)\log(1-F(x)) + \log f(x) + C\tag{1}$$

where $C$ is an irrelevant constant of normalization.

Expand the components of $\log G(x;\alpha,\beta)$ in Taylor series to order three around a value $x_0$ (which will be close to a mode). For instance, we may write the expansion of $\log F$ as

$$\log F(x) = c^{F}_0 + c^{F}_1 (x-x_0) + c^{F}_2(x-x_0)^2 + c^{F}_3h^3$$

for some $h$ with $|h| \le |x-x_0|$. Use a similar notation for $\log(1-F)$ and $\log f$.

Linear terms

The linear term in $(1)$ thereby becomes

$$g_1(\alpha,\beta) = (\alpha-1)c^{F}_1 + (\beta-1)c^{1-F}_1 + c^{f}_1.$$

When $x_0$ is a mode of $G(\,;\alpha,\beta)$, this expression is zero. Note that because the coefficients are continuous functions of $x_0$, as $\alpha$ and $\beta$ are varied, the mode $x_0$ will vary continuously too. Moreover, once $\alpha$ and $\beta$ are sufficiently large, the $c^{f}_1$ term becomes relatively inconsequential. If we aim to study the limit as $\alpha\to\infty$ and $\beta\to\infty$ for which $\alpha:\beta$ stays in constant proportion $\gamma$, we may therefore once and for all choose a base point $x_0$ for which

$$\gamma c^{F}_1 + c^{1-F}_1 = 0.$$

A nice case is where $\gamma=1$, where $\alpha=\beta$ throughout, and $F$ is symmetric about $0$. In that case it is obvious $x_0=F(0)=1/2$.

We have achieved a method whereby (a) in the limit, the first-order term in the Taylor series vanishes and (b) in the special case just described, the first-order term is always zero.

Quadratic terms

These are the sum

$$g_2(\alpha,\beta) = (\alpha-1)c^{F}_2 + (\beta-1)c^{1-F}_2 + c^{f}_2.$$

Comparing to a Normal distribution, whose quadratic term is $-(1/2)(x-x_0)^2/\sigma^2$, we may estimate that $-1/(2g_2(\alpha,\beta))$ is approximately the variance of $G$. Let us standardize $G$ by rescaling $x$ by its square root. we don't really need the details; it suffices to understand that this rescaling is going to multiply the coefficient of $(x-x_0)^n$ in the Taylor expansion by $(-1/(2g_2(\alpha,\beta)))^{n/2}.$

Remainder term

Here's the punchline: the term of order $n$ in the Taylor expansion is, according to our notation,

$$g_n(\alpha,\beta) = (\alpha-1)c^{F}_n + (\beta-1)c^{1-F}_n + c^{f}_n.$$

After standardization, it becomes

$$g_n^\prime(\alpha,\beta) = \frac{g_n(\alpha,\beta)}{(-2g_2(\alpha,\beta))^{n/2})}.$$

Both of the $g_i$ are affine combination of $\alpha$ and $\beta$. By raising the denominator to the $n/2$ power, the net behavior is of order $-(n-2)/2$ in each of $\alpha$ and $\beta$. As these parameters grow large, then, each term in the Taylor expansion after the second decreases to zero asymptotically. In particular, the third-order remainder term becomes arbitrarily small.

The case when $F$ is normal

The vanishing of the remainder term is particularly fast when $F$ is standard Normal, because in this case $f(x)$ is purely quadratic: it contributes nothing to the remainder terms. Consequently, the deviation of $G$ from normality depends solely on the deviation between $F^{\alpha-1}(1-F)^{\beta-1}$ and normality.

This deviation is fairly small even for small $\alpha$ and $\beta$. To illustrate, consider the case $\alpha=\beta$. $G$ is symmetric, whence the order-3 term vanishes altogether. The remainder is of order $4$ in $x-x_0=x$.

Here is a plot showing how the standardized fourth order term changes with small values of $\alpha \gt 1$:

Figure

The value starts out at $0$ for $\alpha=\beta=1$, because then the distribution obviously is Normal ($\Phi^{-1}$ applied to a uniform distribution, which is what Beta$(1,1)$ is, gives a standard Normal distribution). Although it increases rapidly, it tops off at less than $0.008$--which is practically indistinguishable from zero. After that the asymptotic reciprocal decay kicks in, making the distribution ever closer to Normal as $\alpha$ increases beyond $2$.