Solved – How to derive the Projected normal distribution

circular statisticsnormal distributionoffset

Suppose we have a bivariate normal variable $\mathbf{x}= (x_1, x_2)$ with mean $\mu_1$ and $\mu_2$ and variances $\sigma_1^2$ and $\sigma_2^2$ and correlation $\rho$.
I need to obtain the pdf of the transformation $||\mathbf{x}||^{-1} \mathbf{x}$: the coordinates of a variable in the unit circle.

In directional statistic this is a well known distribution called the Projected normal, or offset normal. I have already the pdf but i want to derive it by myself. I'm able to do a variable transformation when there is a 1 to 1 relation but this is not the case.

Can someone put me in the right way?

Just for completition: http://www.sciencedirect.com/science/article/pii/S1572312712000457 A pdf where it is explained the projected normal distribution

Best Answer

There is a trick for calculating the pdf of 2-d projected normal, which can be found on pp. 52 of Mardia (1972).

After change of variable $x_1 = r\, \cos\theta$ and $x_2 = r\,\sin\theta$, we have $f(\theta)= \int_{0}^{\infty}\frac{1}{\sqrt{2\pi}\sigma_1\sigma_2\sqrt{1-\rho^2} }r\,\mbox{exp}\{-\frac{\sigma_2^2(r\cos\theta-\mu_1)^2-2\rho(r\cos\theta-\mu_1)(r\sin\theta-\mu_2)+\sigma_1^2(r\sin\theta-\mu_2)^2}{2\sigma_1^2\sigma_2^2(1-\rho^2)}\}\mbox{d}r$,

Use the following result, $d^2\int_{0}^{\infty}r\,\mbox{exp}\{-\frac{1}{2}d^2(r^2-2br)\}\mbox{d}r = 1+ (2\pi)^{\frac{1}{2}}bd\,\mbox{e}^{\frac{1}{2}b^2d^2}\Phi(bd),$

where $\Phi(x)$ is the cdf of $\mbox{N}(0,1)$.

With $d^2 = \frac{\sigma_2^2\cos^2\theta-\rho\sigma_1\sigma_2\sin2\theta+\sigma_1^2\sin^2\theta}{(1-\rho^2)\sigma_1^2\sigma_2^2}$ and $b = \frac{\mu_1\sigma_2(\sigma_2\cos\theta-\rho\sigma_1\sin\theta)+\mu_2\sigma_1(\sigma_1\sin\theta-\rho\sigma_2\cos\theta)}{\sigma_2^2\cos^2\theta-\rho\sigma_1\sigma_2\sin2\theta+\sigma_1^2\sin^2\theta}$, you are going to obtain the pdf listed in the above paper.

Related Solutions

Solved – When is the distribution of product of two normal distributed variables near normal distribution

(this answer uses parts of @whuber's comment)

Let $X,Y$ be two independent normals. We can write the product as $$ XY = \frac14 \left( (X+Y)^2 - (X-Y)^2 \right) $$ will have the distribution of the difference (scaled) of two noncentral chisquare random variables (central if both have zero means). Note that if the variances are equal, the two terms will be independent. Since chisquare distribution is a case of gamma, Generic sum of Gamma random variables is relevant. I will give a very special case of this, taken from the encyclopedic reference https://www.amazon.com/Probability-Distributions-Involving-Gaussian-Variables/dp/0387346570

When $X$ and $Y$ are independent, zero-mean with possibly different variances the density function of the product $Z=XY$ is given by $$ f(z)= \frac1{\pi \sigma_1 \sigma_2} K_0(\frac{|z|}{\sigma_1 \sigma_2}) $$ where $K_0$ is the modified Bessel function of the second kind.

This can be written in R as

    dprodnorm  <-  function(x, sigma1=1, sigma2=1) {
       (1/(pi*sigma1*sigma2)) * besselK(abs(x)/(sigma1*sigma2),  0)
    }
    ### Numerical check:
    integrate( function(x) dprodnorm(x), lower=-Inf,  upper=Inf)
    0.9999999 with absolute error < 3e-06

Let us plot this, together with some simulations:

    set.seed(7*11*13)  
    Z  <-  rnorm(10000) * rnorm(10000)
    
    hist(Z, prob=TRUE, nclass="scott", ylim=c(0, 1.5), 
            main="histogram and density of product of independent 
                  normals")
    plot( function(x) dprodnorm(x),  from=-5,  to=5,  n=1001,  
          col="red", add=TRUE, lwd=3)
    ### Change to nclass="fd" gives a closer fit

The plot shows quite clearly that the distribution is not close to normal.

The stated reference do also give more involved cases (non-zero means ...) but then expressions for density functions becomes so complicated that they only gives characteristic function, which still are reasonably simple, and can be inverted to get densities.

Normal Distribution – How to Calculate Total Probability Inside a Slice of Bivariate Normal Distribution in R

Yes, a Normal approximation works--but not in all cases. We need to do some analysis to identify when the approximation is a good one.

Exact solution

Re-express $(X_1,X_2)$ in standardized units so that they have zero means and unit variances. Letting $\Phi$ be the standard Normal distribution function (its CDF), it is well known from the theory of ordinary least squares regression that

$$\Pr(X_1 \le z\,|\, X_2 = x) = \Phi\left(\frac{z - \rho x}{\sqrt{1-\rho^2}}\right).$$

The desired probability then can be obtained by integrating:

$$\eqalign{\Pr(X_1 \le z\,|\, a \lt X_2 \le b) &= \frac{1}{\Phi(b)-\Phi(a)}\int_a^b \Pr(X_1\le z\,|\, X_2=x) \phi(x)\,dx \\&= \frac{1}{\Phi(b)-\Phi(a)}\int_a^b \Phi\left(\frac{z - \rho x}{\sqrt{1-\rho^2}}\right) \phi(x)\,dx.}$$

This appears to require numerical integration (although the result for $(a,b)=\mathbb{R}$ is obtainable in closed form: see How can I calculate $\int^{\infty}_{-\infty}\Phi\left(\frac{w-a}{b}\right)\phi(w)\,\mathrm dw$).

Approximating the distribution

This expression can be differentiated under the integral sign (with respect to $z$) to obtain the PDF,

$$f(z\,|\, a \lt X_2 \le b) = \phi(z)\ \frac{\Phi\left(\frac{b-\rho z}{\sqrt{1-\rho^2}}\right) - \Phi\left(\frac{a-\rho z}{\sqrt{1-\rho^2}}\right)}{\Phi(b) - \Phi(a)}.$$

This exhibits the PDF as product of the standard Normal PDF $\phi$ and a "correction". When $b-a$ is small compared to $\sqrt{1-\rho^2}$ (specifically, when $(b-a)^2 \ll 1-\rho^2$), we might approximate the difference in the numerator with the first derivative:

$$\Phi\left(\frac{b-\rho z}{\sqrt{1-\rho^2}}\right) - \Phi\left(\frac{a-\rho z}{\sqrt{1-\rho^2}}\right)\approx \phi\left(\frac{(a+b)/2-\rho z}{\sqrt{1-\rho^2}}\right)\frac{b-a}{\sqrt{1-\rho^2}}.$$

The error in this approximation is uniformly bounded (across all values of $z$) because the second derivative of $\Phi$ is bounded.

With this approximation, and completing the square, we obtain

$$f(z\,|\, a \lt X_2 \le b) \approx \phi\left(z; \rho(a+b)/2, \sqrt{1-\rho^2}\right) \frac{(b-a)\exp\left(-(a+b)^2/8\right)}{(\Phi(b)-\Phi(a))\sqrt{2\pi}}.$$

($\phi(*; \mu, \sigma)$ denotes the PDF of a Normal distribution of mean $\mu$ and standard deviation $\sigma$.)

Moreover, the right hand factor (which does not depend on $z$) must be very close to $1$, because the $\phi$ term integrates to unity, whence

$$f(z\,|\, a \lt X_2 \le b) \approx \phi\left(z; \rho(a+b)/2, \sqrt{1-\rho^2}\right).$$

All this makes very good sense: the conditional distribution of $X_1$ is approximated by its conditional distribution at the midpoint of the interval, $(a+b)/2$, where it has mean $\rho(a+b)/2$ and standard deviation $\sqrt{1-\rho^2}$. The error is proportional to the width of the interval $b-a$ and to an expression dominated by $\exp(-(a+b)^2/8)$, which becomes important only when both $a$ and $b$ are out in the same tail. Therefore, this Normal approximation works for narrow slices not too far into the tails of the bivariate distribution. Moreover, the difference between

$$\frac{(b-a)\exp\left(-(a+b)^2/8\right)}{(\Phi(b)-\Phi(a))\sqrt{2\pi}}$$

and $1$ serves as an excellent check of the quality of the approximation.

Edit

To check these conclusions, I simulated data in R for various values of $b$ and $\rho$ ($a=-3$ in all cases), drew their empirical density, and superimposed on that the theoretical (blue) and approximate (red) densities for comparison. (You cannot see the density plots because the theoretical plots fit over them almost perfectly.) As $|\rho|$ gets close to $1$, the approximation grows poorer: this deserves further study. Clear the approximation is excellent for sufficiently small values of $b-a$.

#
# Numerical integration, to give a correct value.
#
f <- function(z, a, b, rho, value.only=FALSE, ...) {
  g <- function(x) pnorm((z - rho*x)/sqrt(1-rho^2)) * dnorm(x) / (pnorm(b) - pnorm(a))
  u <- integrate(g, a, b, ...)
  if (value.only) return(u$value) else return(u)
}
#
# Set up the problem.
#
a <- -3  # Left endpoint
n <- 1e5 # Simulation size
par(mfrow=c(2,3))
for (rho in c(1/4, -2/3)) {
  for (b in c(-2.5, -2, -1.5)) {
    z <- seq((a-3)*rho, (b+3)*rho, length.out=101)
    #
    # Check the approximation (`v` needs to be small).
    #
    v <- (b-a) * exp(-(a+b)^2/8) / (pnorm(b) - pnorm(a)) / sqrt(2*pi) - 1
    #
    # Simulate some values of (x1, x2).
    #
    x.2 <- qnorm(runif(n, pnorm(a), pnorm(b)))  # Normal between `a` and `b`
    x.1 <- rho*x.2 + rnorm(n, sd=sqrt(1-rho^2))
    #
    # Compare the simulated distribution to the theoretical and approximate
    # densities.
    #
    x.hat <- density(x.1)
    plot(x.hat, type="l", lwd=2, 
         main="Simulated, True, and Approximate",
         sub=paste0("a=", round(a,2), ", b=", round(b, 2), ", rho=", round(rho, 2), 
         "; v=", round(v,3)),
         xlab="X.1", ylab="Density")

    # Theoretical
    curve(dnorm(x) * (pnorm((b-rho*x)/sqrt(1-rho^2)) - pnorm((a-rho*x)/sqrt(1-rho^2))) /
       (pnorm(b) - pnorm(a)), lwd=2, col="Blue", add=TRUE)

    # Approximate
    curve(dnorm(x, rho*(a+b)/2, sqrt(1-rho^2)), col="Red", lwd=2, add=TRUE)
  }
}