[Math] Correlation coefficient of a bivariate normal distribution

probabilityprobability distributions

I know that the correlation coefficient of a bivariate normal distribution is given by $$\rho_{X,Y}=\rho,$$ since $$\text{cov}(X,Y)=\rho\sigma_{X}\sigma_{Y}$$ for a bivariate normal distribution, and in general, $$\rho_{X,Y}=\frac{\text{cov}(X,Y)}{\sigma_{X}{\sigma_{Y}}}.$$ However, without basing on the parameter $\rho$ from the bivariate normal distribution, is there a way to find the correlation coefficient of a bivariate normal distribution suppose $\mu_{X}$, $\sigma_{X}$, $\mu_{Y}$, and $\sigma_{Y}$ are given? Our probability class professor said that the correlation coefficient is not always given and can change depending on other parameters (parameter $\rho$ not included). She specifically mentioned the mathematical definition of covariance, which is $$\text{cov}(X,Y)=E[(X-\mu_{X})(Y-\mu_{Y})].$$ So we have $$\text{cov}(X,Y)=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}(x-\mu_{x})(y-\mu_{y})f(x,y)\,dx\,dy$$ where $f(x,y)$ is the bivariate normal distribution function. However, if this is to be followed, then that would not eliminate the parameter $\rho$ from the distribution, and I believe will still result in $\rho\sigma_{X}\sigma_{Y}$.

So, is the correlation coefficient of a bivariate normal distribution always given (i.e. it can be changed at will depending on the behavior of the relationship of the two variables) or can it be derived using the four previously mentioned parameters?

Best Answer

The parameter $\rho$ cannot be derived from the other four parameters, i.e., the distribution function $f(x,y)$ depends on $\rho$. You can see why this is the case considering two examples. Imagine two random variables $X$ and $Y$

  • $\color{blue}{\text{CASE 1}}$: $X$: height of a person, $Y$: Savings on their bank account

  • $\color{red}{\text{CASE 2}}$: $X$: height of a person, $Y$: weight

In $\color{blue}{\text{CASE 1}}$ you would expect almost no correlation between variables $X$ and $Y$, that is: how tall a person is has little to none impact on how much money they have saved in their bank account. If you were to plot samples from these two variables you'd get a cloud of points with no trend whatsoever.

$\color{red}{\text{CASE 2}}$ is a whole different story. You can expect that a taller person is in general heavier, so a correlation between $X$ and $Y$ must exist in these case. In the figure below I show an example (au stands form arbitrary units)

enter image description here

The question is: how do you tell the difference between these two cases based just on $\sigma_{X}$, $\sigma_{Y}$, $\mu_{X}$ and $\sigma_{Y}$? The answer is: you can't. You need another number to express the correlation, that's where the number $\rho$ comes into play!

Related Question