I have deduced the bivariate normal density function. However am unaware of what happens when the correlation coefficient $\rho$ tends to 1 and -1?
Solved – Bivariate normal distribution with $|\rho|=1$
bivariatecorrelationnormal distribution
Related Solutions
A quick Google search led me to this article: Estimating the Correlation in Bivariate Normal Data With Known Variances and Small Sample Sizes. In this article, they discussed several possible priors: "uniform", "Jeffreys", and "arc-sine". Specifically, take the "uniform" prior for example, it assumes that $\rho$ follows a uniform distribution on $[-1,1]$.
This leads to a density of the form: $$ f(\rho | X, Y) \propto \prod\limits_i{P(x_i, y_i | \rho)} $$
If a full Bayesian estimator is desired, then equation $\hat{\rho}^{(6)}$ on page 35 of that article needs to be computed. This can be done using a Monte Carlo integration.
Or, if one only needs the MAP estimate of $\rho$, it can be done using Gibbs sampling. I followed one citation of the first article: (Barnard 2000) Modeling Covariance Matrices in terms of Standard Deviations and Correlations, with Application to Shrinkage. In this citation, it is discussed in detail how a uniform prior for $\rho$ is derived out of the Inverse-Wishart distribution for the covariance matrix $\Sigma$ and how Gibbs sampling was carried out. (Equation (8)).
I also tried searching for other references, but was unable to find possible conjugate priors for $\rho$ and approaches to get the Bayesian estimation of $\rho$ without doing numerical integration or sampling. It is an interesting problem to look at by the way.
Yes.
By definition, the value of the CDF (call it $F_\rho$) at $(s,t)$ is the chance that the first component is less than or equal to $s$ and the second is less than or equal to $t$:
$$F_\rho(s,t) = \frac{1}{2 \pi \sqrt{1-\rho ^2}}\int_{-\infty}^t \int_{-\infty}^s e^{-\frac{\frac{x^2}{2}-\rho x y+\frac{y^2}{2}}{1-\rho ^2}} dx dy.$$
Performing the $x$ integration and then differentiating with respect to $\rho$ under the integral sign yields
$$\frac{-1}{2 \pi \left(1-\rho ^2\right)^{3/2}} \int_{-\infty}^t e^{\frac{s^2-2 \rho s y+y^2}{2 \left(\rho ^2-1\right)}} (y-\rho s) dy.$$
This can be integrated directly to produce the PDF $$f_\rho(s,t) = \frac{1}{2 \pi \sqrt{1-\rho ^2}}e^{-\frac{\frac{s^2}{2}-\rho s t+\frac{t^2}{2}}{1-\rho ^2}}.$$
Because the integrands are so well-behaved, we may reverse the order of integration and differentiation, concluding that for all $(s,t)$,
$$\frac{\partial}{\partial \rho} F_\rho(s,t) = f_\rho(s,t).$$
Because $f_\rho(s,t)\gt 0$ everywhere, $F_\rho$ is strictly monotone in $\rho$ everywhere, QED.
Best Answer
A simple-minded (that is, non-measure-theoretic) version of the answer is as follows.
If random variables $X$ and $Y$ are such that
every point $(x,y)$ in a region $\mathcal A$ of the plane is a possible realization of $(X,Y)$
The area of $\mathcal A$ is greater than $0$
and
then $X$ and $Y$ are said to be jointly continuous random variables, and their probabilistic behavior can be determined from their joint density function $f_{X,Y}(x,y)$ whose support is $\mathcal A$. Note that $X$ and $Y$ are also (marginally) continuous random variables.
But $X$ and $Y$ are jointly continuous (and thus enjoy the bivariate normal joint density function that you have found or been told about) only if their (Pearson) correlation coefficient $\rho \in (-1,1)$. When $\rho = \pm 1$, $X$ and $Y$ are not jointly continuous and they don't have a joint density function. They do, however, continue to enjoy the properties stated in the highlighted paragraph above. that is, they are still said to have a bivariate normal distribution (even though they don't have the bivariate normal density), and they are individually normal random variables (and hence continuous). Note that in this case, all realizations of $(X,Y)$ lie on the straight line $$y = \mu_Y + \frac{\sigma_Y}{\sigma_X}(x-\mu_X)$$ passing through $(\mu_X,\mu_Y)$. Note that the straight line has zero area. Since $Y = \mu_Y + \frac{\sigma_Y}{\sigma_X}(X-\mu_X)$, any questions about the probabilistic behavior of $(X,Y)$ can be translated into a question about the probabilistic behavior of $X$ alone and answered based on the knowledge that $X \sim N(\mu_X,\sigma_X^2)$. Since it is also true that $X = \mu_X + \frac{\sigma_X}{\sigma_Y}(Y-\mu_Y)$, contrary-minded folks might prefer to translate the question about $(X,Y)$ that has been asked into a question about the probabilistic behavior of $Y$ alone and answer it based on the knowledge that $Y \sim N(\mu_Y,\sigma_Y^2)$