I think I have found the discrepancy.The short answer is that the distribution for case (2) is a doubly noncentral beta distribution. The bulk correlation is the same under each case.
In the solution below, I've used slightly different notation. In particular, $\boldsymbol{s }$ $\rightarrow$ $\boldsymbol{w}$. I've also used $\hat{A}$ to describe the maximum liklehood estimate for the amplitude of the sample vector $\boldsymbol{w}$ that maximally correlates with $\boldsymbol{x}$. This notation and formulation is targeted at detection theory applications.
The relative square error in approximating approximating a data stream $ \boldsymbol{x} $ using a waveform template $\boldsymbol{w }$ is the ratio of the least-squares error $\vert\vert \boldsymbol{e} \vert\vert^{2}$ to measure signal energy $ \vert \vert \boldsymbol{x} \vert \vert^{2}$. This error may be re-written as a ratio of quadratic forms that has a doubly noncentral $\beta$ distribution:
\begin{equation}
\begin{split}
\cfrac{\vert \vert \boldsymbol{e} \vert \vert^{2}}{\vert \vert \boldsymbol{x} \vert \vert^{2}} &= \cfrac{\vert \vert \boldsymbol{x} - \hat{A} \boldsymbol{w } \vert \vert^{2} }{ \vert \vert \boldsymbol{x} \vert \vert^{2}}
\\
&= \cfrac{\vert \vert \boldsymbol{x} - \cfrac{\langle \boldsymbol{x},\, \boldsymbol{w} \rangle}{ \vert \vert \boldsymbol{w } \vert \vert^{2}} \boldsymbol{w } \vert \vert^{2}} {\vert \vert \boldsymbol{x} \vert \vert^{2}}
\\
&= \cfrac{ \vert \vert P_{\boldsymbol{w}}^{\perp} \left( \boldsymbol{x} \right) \vert \vert ^{2}} { \vert \vert P_{\boldsymbol{w}}^{\perp}\left( \boldsymbol{x} \right) \vert \vert ^{2} + \vert \vert P_{\boldsymbol{w}} \left( \boldsymbol{x} \right) \vert \vert ^{2}}
\\
&\overset{d}{=} \cfrac{ \chi_{1}^{2}( \lambda^{\perp} )} { \chi_{1}^{2}( \lambda ) + \chi_{N_{E} - 1}^{2}( \lambda^{\perp} ) },
\end{split}
\end{equation}
where the noncentrality parameters are defined by $\lambda$ $=$ $\cfrac{\vert \vert P_{\boldsymbol{w}} \left( \boldsymbol{x} \right) \vert \vert ^{2}}{\sigma^{2}}$ and $\lambda^{\perp}$ $=$ $\cfrac{\vert \vert P_{\boldsymbol{w}}^{\perp} \left( \boldsymbol{x} \right) \vert \vert ^{2}}{\sigma^{2}}$, and where $\overset{d}{=}$ indicates distributional equality. This ratio is also related to the sample correlation coefficient $r$:
\begin{equation}
\begin{split}
\cfrac{\vert \vert \boldsymbol{x} - \cfrac{\langle \boldsymbol{x},\, \boldsymbol{w} \rangle}{ \vert \vert \boldsymbol{w } \vert \vert^{2}} \boldsymbol{w } \vert \vert^{2}} {\vert \vert \boldsymbol{x} \vert \vert^{2}}
&=
1- \cfrac{\langle \boldsymbol{x},\, \boldsymbol{w} \rangle^{2} }{ \vert \vert \boldsymbol{w } \vert \vert^{2} \vert \vert \boldsymbol{x } \vert \vert^{2}}
\\
&=
1 - r^{2}
\end{split}
\end{equation}
Therefore:
\begin{equation}
\begin{split}
r^{2} &\overset{d}{=} \cfrac{ \chi_{1}^{2}( \lambda )} { \chi_{1}^{2}( \lambda ) + \chi_{N_{E} - 1}^{2}( \lambda^{\perp} ) }
\\
&\sim \text{Beta} \left( \frac{1}{2}, \frac{1}{2}N_{E} ; \lambda, \lambda^{\perp} \right)
\end{split}
\end{equation}
Distinct hypotheses regarding the distribution for $\boldsymbol{x}$ simplify the form for this distribution. When the data stream contains only noise, the hypothesis $\mathcal{H}_{0}$ is satisfied and $r^{2}$ has a central Beta distribution, where $\lambda^{\perp}$ $=$ $\lambda$ $=$ $0$. In the presence of signal, a data stream $\boldsymbol{x}$ will generally have a non-zero projection $P_{\boldsymbol{w}}^{\perp}\left( \boldsymbol{x} \right)$ orthogonal to the noise-contaminated template data vector $\boldsymbol{w}$. In this case $\lambda$, $\lambda^{\perp}$ $\ne$ $0$, and $r^{2}$ has doubly noncentral Beta distribution. If the template signal has a very large SNR, then $ \boldsymbol{x}$ $\cong$ $A \boldsymbol{w}$ $+$ $\boldsymbol{n}$, $\lambda^{\perp}$ $=$ $0$, and $r^{2}$ is reasonably approximated by a noncentral Beta distribution. The noncentral Beta distribution therefore provides an absolute upper bound on the detection performance of a correlation detector.
You have
$$
\begin{bmatrix} Z_1 \\ Z_2 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1-\rho^2} \end{bmatrix} \begin{bmatrix} X \\ Y \end{bmatrix}.
$$
The determinant of this matrix is $\sqrt{1-\rho^2}$.
You have the density
$$
f_{X,Y}(x,y) = \frac{1}{2\pi} \exp\left( \frac{-1}{2}(x^2+y^2) \right)
$$
and
$$
\begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1-\rho^2} \end{bmatrix}^{-1} = \begin{bmatrix} 1 & 0 \\ \frac{-\rho}{\sqrt{1-\rho^2}} & \frac{1}{\sqrt{1-\rho^2}} \end{bmatrix}
$$
and the determinant of this matrix is $\sqrt{1-\rho^2}$.
That and your assertion about the density will give you the joint density of $W$ and $V$.
If you're looking for the correlation, you can read the covariance and the two variances out of the density function, but that should not be necessary. If you have two random variables $X,Y$ whose covariance matrix is $M$, and you've got
$$
\begin{bmatrix} W \\ V \end{bmatrix} = A \begin{bmatrix} X \\ Y \end{bmatrix},
$$
then the covariance matrix of $\begin{bmatrix} W \\ V \end{bmatrix}$ is
$$
AMA^T.
$$
In this case that is
$$
\begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1-\rho^2} \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & \rho \\ 0 & \sqrt{1-\rho^2} \end{bmatrix} = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}.
$$
That gives you $\operatorname{cov}(W,V)$ and the two variances, and since both variances are $1$, the correlation is the covariance.
Best Answer
The parameter $\rho$ cannot be derived from the other four parameters, i.e., the distribution function $f(x,y)$ depends on $\rho$. You can see why this is the case considering two examples. Imagine two random variables $X$ and $Y$
$\color{blue}{\text{CASE 1}}$: $X$: height of a person, $Y$: Savings on their bank account
$\color{red}{\text{CASE 2}}$: $X$: height of a person, $Y$: weight
In $\color{blue}{\text{CASE 1}}$ you would expect almost no correlation between variables $X$ and $Y$, that is: how tall a person is has little to none impact on how much money they have saved in their bank account. If you were to plot samples from these two variables you'd get a cloud of points with no trend whatsoever.
$\color{red}{\text{CASE 2}}$ is a whole different story. You can expect that a taller person is in general heavier, so a correlation between $X$ and $Y$ must exist in these case. In the figure below I show an example (au stands form arbitrary units)
The question is: how do you tell the difference between these two cases based just on $\sigma_{X}$, $\sigma_{Y}$, $\mu_{X}$ and $\sigma_{Y}$? The answer is: you can't. You need another number to express the correlation, that's where the number $\rho$ comes into play!