Solved – Calculating variance of a piecewise function of bivariate normal variables

bivariatemultivariate normal distributionvariance

I have a variable $X\in R^2$ which has a bivariate normal distribution with mean vector $\mu$ and covariance matrix $\Sigma$.

The vector $X$ is composed of two variables $X=(x_1,x_2)$

A function $f$ is a function of $x_1$ only

$f(x_1) = \left\{\begin{array}{lr}
0 & \text{for } x_1 < 0\\
x_1 & \text{for } x_1 \geq 0\\
\end{array} \right\}
$

I already calculated that the mean of $f(x_1)$ is $\frac{\sigma_{11}}{\sqrt{2\pi}}$

I am trying to find the variance of $f(x_1)$. I thought this would be straightforward but when I try to verify it numerically the results don't match the predictions. This is my approach:

Let $y=f(x_1)$

Let $g(x_1,x_2, \mu,\Sigma)$ be the probability density function of the bivariate normal variable.

$Var[y]=E[y^2]-E[y]^2=E[y^2]-\frac{\sigma^2}{2\pi}$

$E[y^2] = \int_{x_1=-\infty}^{\infty} \int_{x_2=-\infty}^{\infty} y^2 f(x_1) g(x_1,x_2, \mu,\Sigma) dx_2 dx_1$

For $x_1 \geq 0$ then $y=x_1$, otherwise $y=0$, so the integral simplifies to

$E[y^2] = \int_{x_1=0}^{\infty} \int_{x_2=-\infty}^{\infty} x_1^2 f(x_1) g(x_1,x_2, \mu,\Sigma) dx_2 dx_1$

And a minor change:

$E[y^2] = \int_{x_1=0}^{\infty} x_1^2 f(x_1) \int_{x_2=-\infty}^{\infty} g(x_1,x_2, \mu,\Sigma) dx_2 dx_1$

I use wolfram alpha to get the result (for $\mu_{x_1}=0$) and substitute it into the equation for the variance:

$Var[y]=E[y^2]-\frac{\sigma^2}{2\pi}$

The formula for the variance matches with a monte-carlo numerical integration so wolfram is doing the integration correctly. Unfortunately, the predictions don't match the sample variance from a numerical simulation of the bivariate distribution. Because of this I think that the way i set up the integration was incorrect. It seems straightforward but I haven't done this with piecewise functions or multiple variables before.

Is there any obvious error in the way I set up the integral for $E[y^2]$?

Best Answer

Since $(X_1,X_2)$ is bi-variate normal, $X_1\sim N(\mu_1,\sigma_{11})$. For $Y=X_1^+$, the mean you derived is for special case that $\mu_1=0$, specifically, $$\operatorname{E}[Y]=\int_0^{\infty}xf_{X_1}(x)\,dx=\frac{\sigma_{11}}{\sqrt{2\pi}}.$$ Moreover, we can compute that $$\operatorname{E}[Y^2]=\int_0^{\infty}x^2f_{X_1}(x)\,dx=\frac{\sigma_{11}^2}{2}.$$ So $$\operatorname{Var}(Y)=\frac{\sigma_{11}^2}{2}\left(1-\frac{1}{\pi}\right).$$ This can also be derived from variance of half-normal distribution, note first $|X_1|=X_1^++X_1^-$, then $\operatorname{E}[X_1^+X_1^-]=0$ since whenever one r.v. is greater than $0$ the other is $0$, such that $$\operatorname{Cov}(X_1^+,X_1^-)=-\operatorname{E}[X_1^+]\operatorname{E}[X_1^-]=-\frac{\sigma_{11}^2}{2\pi};$$ finally, notice that $\operatorname{Var}(X_1^+)=\operatorname{Var}(X_1^-)$ by symmetry. It follows then $$\sigma_{11}^2\left(1-\frac{2}{\pi}\right)=\operatorname{Var}(|X_1|)=2\operatorname{Var}(X_1^+)-\frac{\sigma_{11}^2}{\pi},$$ which yields $$\operatorname{Var}(X_1^+)=\frac{\sigma_{11}^2}{2}\left(1-\frac{1}{\pi}\right)$$

For more general $\mu_1$, this entry about folded normal distribution may help.

Related Question