[Math] Maximum and minimum Correlation Coefficient

correlationstatistics

I have a question regarding the correlation coefficient. The inspiration is from a story where a student collected a set of $(X,Y)$ pairs, but lost the pairings. Hence, he is left with two sets of values of $X$ and $Y$. Say we have eight readings each and the values of $X$ are $\left\{ 4, 6, 7, 3, 3, 1, 9, 4\right\}$ and the the values of $Y$ are $\left\{ 4, 5, 2, 2, 4, 2,7, 9\right\}$.
I am asked to find the highest and lowest possible correlation coefficient given these values. Now, the naive way to find this is to maximize and minimze $\mathbb E (XY)$ since $\rho_{X,Y}= (\mathbb E(XY) – \mathbb E (X) \mathbb E(Y))/\sigma_X \sigma_y$ and the respective (unconditioned, of course) means and variances are fixed.

So what I did was to arrange the values of $X$ and $Y$ in ascending order, multiply the paired values to get $\mathbb E_{\text{max}} (XY) = 25.75$ and so $\rho_{X,Y_{\text{max}}}=0.9619$. To get the minimum, I multiply the minimum of $X$ with the maximum of $Y$ and move backwards up the list of $Y$ to get $\mathbb E_{\text{min}} (XY) = 15.13$ and so $\rho_{X,Y_{\text{min}}}=-0.891$.

My question is that is there a more elegant way like a differentiation method to do so? I believe someone here could enlighten me. I do not feel comfortable with this result as it feels like I am brute forcing my way through, and cannot find a way to convincingly say the values are the maximum or minimum.

Best Answer

The method you used of making the rankings the same or reversed will indeed produce the maximum and minimum values for the covariance and the correlation coefficient.

To see this, note that the reordering will not alter the number of observations, or the means or variances of $X$ and $Y$. In effect the only thing that changes is $\sum_i x_i y_i$.

Now if $x_1 \ge x_2$ and $y_1 \ge y_2$ then $x_1 y_1 + x_2 y_2 \ge x_1 y_2 + x_2 y_1$ as the difference is $(x_1-x_2)(y_1-y_2)$.

So $\sum_i x_i y_i$ is maximised when the ranks are the same in both sequences and minimised when the ranks are opposite: if the are not then do pairwise swaps until they are.

Related Question