Correlation – Understanding the Relationship Between Correlation and Linear Dependency

correlationmathematical-statistics

I have two variables whose correlation coefficient $\rho=1$. I know that if two random variables are a linear combination of themselves, it implies $|\rho |=1$. Is the converse also true? How to prove that?

Best Answer

whuber's much more detailed answer appeared while I was composing this answer of mine (which essentially uses the same argument).

Let $X$ and $Y$ denote two random variables with finite variances $\sigma_X^2$ and $\sigma_Y^2$ respectively and correlation coefficient $\rho = \pm 1$. Then, \begin{align}\operatorname{var}(Y-aX) &= \sigma_Y^2+ a^2\sigma_X^2 - 2a\cdot\operatorname{cov}(Y,X) &\text{standard result}\\ &= \sigma_Y^2+ a^2\sigma_X^2 - 2a\rho\sigma_X\sigma_Y &\text{substitute for}~\operatorname{cov}(Y,X)\\ &= \sigma_Y^2+ a^2\sigma_X^2 \mp 2a\sigma_X\sigma_Y & \text{since}~ \rho = \pm 1\\ &= (\sigma_Y\mp a\sigma_X)^2\\ &= (\sigma_Y - a\rho\sigma_X)^2 & \text{keep remembering that}~ \rho = \pm 1\\ &= 0 &\text{if we choose}~ a = \rho\frac{\sigma_Y}{\sigma_X}. \end{align} Thus, if $\rho = \pm 1$, then $Y-\rho\frac{\sigma_Y}{\sigma_X}X$ is a random variable whose variance is $0$, and so $Y-\rho\frac{\sigma_Y}{\sigma_X}X$ is a constant (almost surely). In other words, $Y = \alpha X + \beta$ (almost surely) and thus $X$ and $Y$ are linearly related (almost surely).

Related Solutions

Correlation – Exploring the Relationship Between Phi, Matthews, and Pearson Correlation Coefficients

Yes, they are the same. The Matthews correlation coefficient is just a particular application of the Pearson correlation coefficient to a confusion table.

A contingency table is just a summary of underlying data. You can convert it back from the counts shown in the contingency table to one row per observations.

Consider the example confusion matrix used in the Wikipedia article with 5 true positives, 17 true negatives, 2 false positives and 3 false negatives

> matrix(c(5,3,2,17), nrow=2, byrow=TRUE)
     [,1] [,2]
[1,]    5    3
[2,]    2   17
> 
> # Matthews correlation coefficient directly from the Wikipedia formula
> (5*17-3*2) / sqrt((5+3)*(5+2)*(17+3)*(17+2))
[1] 0.5415534
> 
> 
> # Convert this into a long form binary variable and find the correlation coefficient
> conf.m <- data.frame(
+ X1=rep(c(0,1,0,1), c(5,3,2,17)),
+ X2=rep(c(0,0,1,1), c(5,3,2,17)))
> conf.m # what does that look like?
   X1 X2
1   0  0
2   0  0
3   0  0
4   0  0
5   0  0
6   1  0
7   1  0
8   1  0
9   0  1
10  0  1
11  1  1
12  1  1
13  1  1
14  1  1
15  1  1
16  1  1
17  1  1
18  1  1
19  1  1
20  1  1
21  1  1
22  1  1
23  1  1
24  1  1
25  1  1
26  1  1
27  1  1
> cor(conf.m)
          X1        X2
X1 1.0000000 0.5415534
X2 0.5415534 1.0000000

Correlation – Sum of Correlation Coefficients and Product of Correlation Coefficients Explained

It is certainly not true in general. Covariance is a bi-linear operator, so the product term is unlikely to decompose into something constructive. Here is a counterexample:

Assume $X$ is not degenerate. Take $Y=1/X$ a.s., $W=X$ a.s.
$\rho(XY,W)=\rho(1,X)=0$
$\rho(X,W)=\rho(X,X)=1$
$\rho(Y,W)=\rho(1/X,X)\ne -1$ as $X$ and $1/X$ are not linearly related (a.s.)
So clearly the equality you suggested does not hold.

Note that $cov(X+Y,W)=cov(X,W)+cov(Y,W)$. Assuming that all the variables have unit variance (so that correlations are equal to covariances), the following must hold:
$\rho(XY,W)=\rho(X+Y,W)$. This can probably provide some insight into the necessary relationship between $X$ and $Y$.

Best Answer

Related Solutions

Correlation – Exploring the Relationship Between Phi, Matthews, and Pearson Correlation Coefficients

Correlation – Sum of Correlation Coefficients and Product of Correlation Coefficients Explained

Related Question