Solved – When is the distribution of product of two normal distributed variables near normal distribution

distributionsnormal distributionnormality-assumptionrandom variable

It is clear the product of normal distributed variables is not normal distributed. For example, if $X \sim N( \mu_1,\sigma_1^2)$, $Y \sim N( \mu_2,\sigma_2^2)$, then $XY$ does not has the distribution of $ N( \mu_1 \mu_2,\mu_1^2 \sigma_1^2+\mu_2^2\sigma_1^2)$.

I have been told that even if the distribution of $XY$ is not normal distribution, the distribution of $XY$ is near to normal distribution, when $\mu_1$ and $\mu_2$ are not so small, $\sigma_1$ and $\sigma_2$ are not so big. Is it true?

Try following R code:

    n1 <- rnorm(10000,0,.005)
    n2 <- rnorm(10000,0,.005)
    n  <- n1*n2
    d  <- density(n)
    plot(d,lwd=2)
    x  <- par('usr')
    dn <- dnorm(d$x,mean=mean(n),sd=sd(n))
    x  <- seq(x[1],x[2],length.out=length(dn))
    lines(x, dn ,col=2, lwd=2)
    legend('topright', legend=c('Estimated density', 'Normal 
    distribution'), lwd=2, lty=c(1,1),col=c(1,2))

Density estimation when <span class= $\sigma_1=\sigma_2=0.005$" />

It seems only when two conditions are both meet, the distribution is near normal. Is there any theoretical analysis?

Best Answer

(this answer uses parts of @whuber's comment)

Let $X,Y$ be two independent normals. We can write the product as $$ XY = \frac14 \left( (X+Y)^2 - (X-Y)^2 \right) $$ will have the distribution of the difference (scaled) of two noncentral chisquare random variables (central if both have zero means). Note that if the variances are equal, the two terms will be independent. Since chisquare distribution is a case of gamma, Generic sum of Gamma random variables is relevant. I will give a very special case of this, taken from the encyclopedic reference https://www.amazon.com/Probability-Distributions-Involving-Gaussian-Variables/dp/0387346570

When $X$ and $Y$ are independent, zero-mean with possibly different variances the density function of the product $Z=XY$ is given by $$ f(z)= \frac1{\pi \sigma_1 \sigma_2} K_0(\frac{|z|}{\sigma_1 \sigma_2}) $$ where $K_0$ is the modified Bessel function of the second kind.

This can be written in R as

    dprodnorm  <-  function(x, sigma1=1, sigma2=1) {
       (1/(pi*sigma1*sigma2)) * besselK(abs(x)/(sigma1*sigma2),  0)
    }
    ### Numerical check:
    integrate( function(x) dprodnorm(x), lower=-Inf,  upper=Inf)
    0.9999999 with absolute error < 3e-06

Let us plot this, together with some simulations:

    set.seed(7*11*13)  
    Z  <-  rnorm(10000) * rnorm(10000)
    
    hist(Z, prob=TRUE, nclass="scott", ylim=c(0, 1.5), 
            main="histogram and density of product of independent 
                  normals")
    plot( function(x) dprodnorm(x),  from=-5,  to=5,  n=1001,  
          col="red", add=TRUE, lwd=3)
    ### Change to nclass="fd" gives a closer fit

The plot shows quite clearly that the distribution is not close to normal.

The stated reference do also give more involved cases (non-zero means ...) but then expressions for density functions becomes so complicated that they only gives characteristic function, which still are reasonably simple, and can be inverted to get densities.

The test

if (a) $X_1 \sim N(\mu_1, \sigma_1)$ and (b) $X_1 \sim N(\mu_2, \sigma_2)$ and (c) $X_1$ and $X_2$ are independent then if you draw a sample of size $N_1$ from the distribution of $X_1$ and a sample of size $N_2$ from the distribution of $X_2$, then the arithemtic average of the sample from $X_i$ is $\bar{x}_i$ and is distributed $\bar{X}_i \sim N(\mu_i, \frac{\sigma_i}{\sqrt{N_i}})$. If the variables are independent then $\bar{X}_1-\bar{X}_2 \sim N(\mu_1 - \mu_2, \sqrt{\frac{\sigma_1^2}{N_1}+\frac{\sigma_2^2}{N_2}})$.

You seem to have some a priori knowledge about the 'truth' namely that $\mu_1 \le \mu_2$ so if you want to find evidence that $\mu_1 < \mu_2$ then (see What follows if we fail to reject the null hypothesis?) your $H_1$ should be $H_1: \mu_1 < \mu_2$ and in order to 'demonstrate' this, one assumes the opposite, but as you say that $\mu_1 \le \mu_2$ the opposite is $H_0: \mu_1 = \mu_2$.

So you have a one-sided test $H_0: \mu_1 - \mu_2 = 0$ against $H_1: \mu_1 - \mu_2 < 0$.

By the above, if $H_0$ is true, then $\mu_1-\mu_2=0$ and therefore $\bar{X}_1-\bar{X}_2 \sim N(0, \sqrt{\frac{\sigma_1^2}{N_1}+\frac{\sigma_2^2}{N_2}})$. With this you can define a left tail, one sided test and using the Neyman-Pearson lemma it can be shown tbe be the unformly most powerfull test. If you known $\sigma_i$ then you can use the normal distribution to define the critical region in the left tail, if you don't know them then you estimate them from the samples and then you have to define the critical region using the t-distribution.

To define the test you have to (1) define a significance level $\alpha$ (2) draw a sample of size $N_1$ from $X_1$ and of size $N_2$ from $X_2$, then (3) compute $\bar{x}_i$ for each of these samples and then compute the p-value of $\bar{x}_1 - \bar{x}_2$ knowing that $\bar{X}_1-\bar{X}_2 \sim N(0, \sqrt{\frac{\sigma_1^2}{N_1}+\frac{\sigma_2^2}{N_2}})$. If the obtained p-value is below $\alpha$ then the test concludes that there is evidence in favour of $H_1$. One can also compute the $\alpha$-quantile of the distribution under $H_0$, : $q_{\alpha}^0$ and $H_0$ will be rejected when $\bar{x}_1-\bar{x}_2 \le q_{\alpha}^0$.

The probability of a false positive is equal to the significance level $\alpha$ that you choose, the probability of a false negative can only be computed for a given value of $\mu_1 - \mu_2$.

The sample sizes

If you have such a value for $\mu_1 - \mu_2$ e.g. the difference is -0.05, then you can compute the type II error for this value:

The type II error is the probability that $H_0$ is accepted when it is false, to compute it, let us assume that it is false and that $\mu_1-\mu_2 = -0.05$. In that case $\bar{X}_1-\bar{X}_2 \sim N(-0.05, \sqrt{\frac{\sigma_1^2}{N_1}+\frac{\sigma_2^2}{N_2}})$ (the mean has changed).

Now we have to compute the probability that $H_0$ is accepted when the latter is true. This is the probability of observing $q_\alpha^0$ under the above distribution with mean -0.05.

it will be a function of $N_1$ and $N_2$. You can then fix a value for $N_2$ (the one that is difficult to sample) and find $N_1$ by fixing a type II error that is acceptable for you.

Normal-Distribution – Conditional Distribution of One Normal Distribution Smaller or Bigger Than Another

Whether above solution is correct

Yes.

How to get the mean and sd for $P(X|X<Y)$ and $P(X|X>Y)$ if they are still normal?

They are not normal.

Proof:

Given $P(X | X>Y) = \frac{\Phi(\frac{x-\mu_2}{\sigma_2})\phi_x(\mu_1,\sigma_1)}{1-\Phi(\frac{\mu_2-\mu_1}{\sqrt{\sigma_2^2+\sigma_1^2}})}$ is equivalent to a product of a uniform random variable $(\Phi(\frac{x-\mu_2}{\sigma_2})$ and a normal random variable $(\phi_x(\mu_1,\sigma_1))$

Consider $X_1 \sim N(0, 1)$ and $X_2 \sim U(0,1)$, then the product $Z = X_1X_2$ distrobution is given by:

\begin{align*} F_Z(z) &= P(Z \leq z)\\ &= P(X_1X_2 \leq z)\\ &= \int_{X_1\geq 0}P(X_2 \leq \frac{z}{x_1}) \phi_{X_1}(x_1)\ dx_1 +\int_{X_1\leq 0}P(X_2 \geq \frac{z}{x_1}) \phi_{X_1}(x_1)\ dx_1\\ &= \int_{X_1\geq 0}\frac{z}{x_1} \phi_{X_1}(x_1)\ dx_1 + \int_{X_1\leq 0}(1-\frac{z}{x_1}) \phi_{X_1}(x_1)\ dx_1\\ &= \frac{1}{2} + \int_{X_1\geq 0}\frac{z}{x_1} \phi_{X_1}(x_1)\ dx_1 - \int_{X_1\leq 0}\frac{z}{x_1} \phi_{X_1}(x_1)\ dx_1 \\ &= \frac{1}{2} + \int\frac{2z}{x_1} \phi_{X_1}(x_1)\ dx_1 \end{align*}

which does not mimick CDF of a normal.

You can however still check if your solution is correct by simulation:

import matplotlib.pyplot as plt
import scipy as sp
import numpy as np

mu1 = 1
sigma1 = 2

mu2 = 2
sigma2 = 3

np.random.seed(42)
X = np.random.normal(mu1, sigma1, 1000)
Y = np.random.normal(mu2, sigma2, 1000)

# P(X|X>Y)
P_X_XgY = X[X>Y]
# P(X|X<Y)
P_X_XlY = X[X<Y]

denom = 1-sp.stats.norm.cdf((mu2-mu1)/np.sqrt(sigma1**2+sigma2**2))

count, bins, ignored = plt.hist(P_X_XgY, 30, normed=True)
plt.plot(bins, 1/(sigma1 * np.sqrt(2 * np.pi)) * \
(sp.stats.norm.cdf((bins-mu2)/sigma2)/denom) *\
np.exp( - (bins - mu1)**2 / (2 * sigma1**2) ), linewidth=2, color='r')
plt.title('$P(X|X>Y)$')

denom = sp.stats.norm.cdf((mu2-mu1)/np.sqrt(sigma1**2+sigma2**2))

count, bins, ignored = plt.hist(P_X_XlY, 30, normed=True)
plt.plot(bins, 1/(sigma1 * np.sqrt(2 * np.pi)) *\
((1-sp.stats.norm.cdf((bins-mu2)/sigma2))/denom) *\
np.exp( - (bins - mu1)**2 / (2 * sigma1**2) ), linewidth=2, color='r')
plt.title('$P(X|X<Y)$')

Best Answer

Related Solutions

Solved – Test if two normally distributed random variables have the same mean

The test

The sample sizes

Normal-Distribution – Conditional Distribution of One Normal Distribution Smaller or Bigger Than Another

Related Question