I think I have found the discrepancy.The short answer is that the distribution for case (2) is a doubly noncentral beta distribution. The bulk correlation is the same under each case.
In the solution below, I've used slightly different notation. In particular, $\boldsymbol{s }$ $\rightarrow$ $\boldsymbol{w}$. I've also used $\hat{A}$ to describe the maximum liklehood estimate for the amplitude of the sample vector $\boldsymbol{w}$ that maximally correlates with $\boldsymbol{x}$. This notation and formulation is targeted at detection theory applications.
The relative square error in approximating approximating a data stream $ \boldsymbol{x} $ using a waveform template $\boldsymbol{w }$ is the ratio of the least-squares error $\vert\vert \boldsymbol{e} \vert\vert^{2}$ to measure signal energy $ \vert \vert \boldsymbol{x} \vert \vert^{2}$. This error may be re-written as a ratio of quadratic forms that has a doubly noncentral $\beta$ distribution:
\begin{equation}
\begin{split}
\cfrac{\vert \vert \boldsymbol{e} \vert \vert^{2}}{\vert \vert \boldsymbol{x} \vert \vert^{2}} &= \cfrac{\vert \vert \boldsymbol{x} - \hat{A} \boldsymbol{w } \vert \vert^{2} }{ \vert \vert \boldsymbol{x} \vert \vert^{2}}
\\
&= \cfrac{\vert \vert \boldsymbol{x} - \cfrac{\langle \boldsymbol{x},\, \boldsymbol{w} \rangle}{ \vert \vert \boldsymbol{w } \vert \vert^{2}} \boldsymbol{w } \vert \vert^{2}} {\vert \vert \boldsymbol{x} \vert \vert^{2}}
\\
&= \cfrac{ \vert \vert P_{\boldsymbol{w}}^{\perp} \left( \boldsymbol{x} \right) \vert \vert ^{2}} { \vert \vert P_{\boldsymbol{w}}^{\perp}\left( \boldsymbol{x} \right) \vert \vert ^{2} + \vert \vert P_{\boldsymbol{w}} \left( \boldsymbol{x} \right) \vert \vert ^{2}}
\\
&\overset{d}{=} \cfrac{ \chi_{1}^{2}( \lambda^{\perp} )} { \chi_{1}^{2}( \lambda ) + \chi_{N_{E} - 1}^{2}( \lambda^{\perp} ) },
\end{split}
\end{equation}
where the noncentrality parameters are defined by $\lambda$ $=$ $\cfrac{\vert \vert P_{\boldsymbol{w}} \left( \boldsymbol{x} \right) \vert \vert ^{2}}{\sigma^{2}}$ and $\lambda^{\perp}$ $=$ $\cfrac{\vert \vert P_{\boldsymbol{w}}^{\perp} \left( \boldsymbol{x} \right) \vert \vert ^{2}}{\sigma^{2}}$, and where $\overset{d}{=}$ indicates distributional equality. This ratio is also related to the sample correlation coefficient $r$:
\begin{equation}
\begin{split}
\cfrac{\vert \vert \boldsymbol{x} - \cfrac{\langle \boldsymbol{x},\, \boldsymbol{w} \rangle}{ \vert \vert \boldsymbol{w } \vert \vert^{2}} \boldsymbol{w } \vert \vert^{2}} {\vert \vert \boldsymbol{x} \vert \vert^{2}}
&=
1- \cfrac{\langle \boldsymbol{x},\, \boldsymbol{w} \rangle^{2} }{ \vert \vert \boldsymbol{w } \vert \vert^{2} \vert \vert \boldsymbol{x } \vert \vert^{2}}
\\
&=
1 - r^{2}
\end{split}
\end{equation}
Therefore:
\begin{equation}
\begin{split}
r^{2} &\overset{d}{=} \cfrac{ \chi_{1}^{2}( \lambda )} { \chi_{1}^{2}( \lambda ) + \chi_{N_{E} - 1}^{2}( \lambda^{\perp} ) }
\\
&\sim \text{Beta} \left( \frac{1}{2}, \frac{1}{2}N_{E} ; \lambda, \lambda^{\perp} \right)
\end{split}
\end{equation}
Distinct hypotheses regarding the distribution for $\boldsymbol{x}$ simplify the form for this distribution. When the data stream contains only noise, the hypothesis $\mathcal{H}_{0}$ is satisfied and $r^{2}$ has a central Beta distribution, where $\lambda^{\perp}$ $=$ $\lambda$ $=$ $0$. In the presence of signal, a data stream $\boldsymbol{x}$ will generally have a non-zero projection $P_{\boldsymbol{w}}^{\perp}\left( \boldsymbol{x} \right)$ orthogonal to the noise-contaminated template data vector $\boldsymbol{w}$. In this case $\lambda$, $\lambda^{\perp}$ $\ne$ $0$, and $r^{2}$ has doubly noncentral Beta distribution. If the template signal has a very large SNR, then $ \boldsymbol{x}$ $\cong$ $A \boldsymbol{w}$ $+$ $\boldsymbol{n}$, $\lambda^{\perp}$ $=$ $0$, and $r^{2}$ is reasonably approximated by a noncentral Beta distribution. The noncentral Beta distribution therefore provides an absolute upper bound on the detection performance of a correlation detector.
We have $\mathbb P(Y_k=i)=p_i$ for $i=1,2,3,4,5,6$ and $k=1,2,\ldots,n$. Then $X_i = \sum_{k=1}^n \mathsf 1_{\{Y_k=i\}}$. It follows that
\begin{align}
\operatorname{Cov}(X_1,X_2) &= \mathbb E[X_1X_2]-\mathbb E[X_1]\mathbb E[X_2]\\
&= \mathbb E\left[\left(\sum_{k=1}^n\mathsf 1_{\{Y_k=1\}}\right)\left(\sum_{k=1}^n\mathsf 1_{\{Y_k=2\}}\right)\right] - (np_1)(np_2).\\
\end{align}
Now, $\mathsf 1_{\{Y_i=1\}}\mathsf 1_{\{Y_i=2\}}=0$ so $\mathbb E\left[\mathsf 1_{\{Y_i=1\}}\mathsf 1_{\{Y_i=2\}}\right]=0$ and for $i\ne j$, $$\mathbb E\left[\mathsf 1_{\{Y_i=1\}}\mathsf 1_{\{Y_j=2\}}\right]=\mathbb P(Y_i=1,Y_j=2)=\mathbb P(Y_i=1)\mathbb P(Y_j=2) = p_1p_2.$$
Hence
$$\operatorname{Cov}(X_1,X_2) = (n^2-n)p_1p_2 -n^2p_1p_2 =-np_1p_2<0, $$
so $X_1$ and $X_2$ are negatively correlated.
Best Answer
The most straightforward (but not the only) way to get a pair of random variables correlated in the way that you want is to take $X \sim \operatorname{Uniform}(x_{\text{low}},x_{\text{up}})$ and define $Y$ as follows:
In other words, $Y$ is a mixture of a random variable independent from $X$, and a random variable perfectly linearly correlated with $X$ (either positively or negatively, depending on which you want).