[Math] Correlation between variables

correlationprobabilityregressionstatistics

I asked this question on stats SE but did not find a suitable answer so far. Maybe someone can help.

Given n random variables x1,…,xn (one-dimensional).
The following is known (corr() = Pearson correlation):

corr(x1,x2) = a
corr(x2,x3) = a

The actual values of the random variables and their covariances are unkown though. Only some of their correlations are known.

From this, is it possible to calculate

corr(x3,x1) = ?

or give an estimate of the lowest possible correlation coefficient

corr(x3,x1) > a

More generally:

Given set of correlations

corr(x_i, x_i+1) with i=[1..c], c<n

is it possible to either directly calculate

corr(x_1, x_c+1)

or give a lower bound a of the coefficient with

corr(x_1, x_c+1) > a

Best Answer

I find it most intuitive to use the cholesky-decomposition of some correlation-matrix to look at such questions. The cholesky-decomposition provides a lower triangular matrix which always has (given the variables $\small x_1,x_2,x_3 $) the form
$\qquad \small \begin{array} {r|lll} x_1: & 1 & . & . & \\ x_2: & a_1 & a_2 & . \\ x_3: & b_1 & b_2 & b_3 \\ \end{array} $
which can be continued to more rows/columns and where the dots mean (systematical) zeroes. The squares of the entries of one row sum up to 1 , and the correlations are the sum of the products of the entries along two rows, say for $\small corr(x_1,x_2)=1 \cdot a_1 $ or $\small corr(x_2,x_3)=a_1 \cdot b_1 + a_2 \cdot b_2 $
If we now want to know the possible range for the correlation $\small corr(x_2,x_3) $ given $\small corr(x_1,x_2)=a $ and $\small corr(x_1,x_3)=b $ then we know immediately that a,b must be the entries in the first column:
$\qquad \small \begin{array} {r|lll} x_1: & 1 & . & . & \\ x_2: & a & a_2 & . \\ x_3: & b & b_2 & b_3 \\ \end{array} $
and by the rule of sum-of-squares = 1 we get
$\qquad \small \begin{array} {r|lll} x_1^*: & 1 & . & . & \\ x_2^*: & a^2 & 1-a^2 & . \\ x_3^*: & b^2 & b_2^2 & 1-b^2-b_2^2 \\ \end{array} $
Here all except the entry $\small b_2$ are fixed or determined by the choice of $\small b_2$, which is also limited to the obvious interval $\small 0 \le b_2^2 \le 1-b^2$.

Let's for simpliness assume a and b are positive values. Then it is also obvious, that we get the possible range for the correlation $\small corr(x_2,x_3) $ if we set $\small x_2 $

  • to its maximum, that is $\small b_2^2 = 1-b^2, b_2=\sqrt{1-b^2} b_3=0$ $\qquad \small \begin{array} {r|lll} x_1: & 1 & . & . & \\ x_2: & a & \sqrt{1-a^2} & . \\ x_3: & b & \sqrt{1-b^2} & 0 \\ \end{array} $
    and $\small corr(x_2,x_3)=a \cdot b + \sqrt{1-a^2}\cdot \sqrt{1-b^2} $
    If a=b we have then $\small corr(x_2,x_3)=a^2 + (1-a^2) = 1 $

  • to some mean value, (which, when we allow only positive values for all entries
    is also its minimum) that is $\small b_2^2 = 0, b_3^2=1-b^2,b_3=\sqrt{1-b^2}$ and
    $\qquad \small \begin{array} {r|lll} x_1: & 1 & . & . & \\ x_2: & a & \sqrt{1-a^2} & . \\ x_3: & b & 0 & \sqrt{1-b^2} \\ \end{array} $
    and $\small corr(x_2,x_3)=a \cdot b + 0 $
    If a=b we have then $\small corr(x_2,x_3)=a^2 + 0 $

  • to its minimum (possibly negative, and then not minimal in its absolute value), that is $\small b_2^2 = 1-b^2, b_2=-\sqrt{1-b^2} ,\qquad b_3=0$
    $\qquad \small \begin{array} {r|lll} x_1: & 1 & . & . & \\ x_2: & a & +\sqrt{1-a^2} & . \\ x_3: & b & - \sqrt{1-b^2} & 0 \\ \end{array} $
    and $\small corr(x_2,x_3)=a \cdot b - \sqrt{1-a^2}\cdot \sqrt{1-b^2} < a\cdot b $

    If a=b then we get $\small corr(x_2,x_3)=a \cdot a - \sqrt{1-a^2}\cdot \sqrt{1-a^2} = 2a^2-1 < a^2 $ which might also come out to be zero or even negative.

Completely similarly this can be done if more variables in the correlation-matrix are existent, because only the number of rows/columns in the cholesky-factor increases accordingly.

(Remark: for simpliness of the exposition of the principle of that calculations I did not attempt a more exact case-distinction)

Related Question