I asked this question on stats SE but did not find a suitable answer so far. Maybe someone can help.
Given n random variables x1,…,xn (one-dimensional).
The following is known (corr() = Pearson correlation):
corr(x1,x2) = a
corr(x2,x3) = a
The actual values of the random variables and their covariances are unkown though. Only some of their correlations are known.
From this, is it possible to calculate
corr(x3,x1) = ?
or give an estimate of the lowest possible correlation coefficient
corr(x3,x1) > a
More generally:
Given set of correlations
corr(x_i, x_i+1) with i=[1..c], c<n
is it possible to either directly calculate
corr(x_1, x_c+1)
or give a lower bound a of the coefficient with
corr(x_1, x_c+1) > a
Best Answer
I find it most intuitive to use the cholesky-decomposition of some correlation-matrix to look at such questions. The cholesky-decomposition provides a lower triangular matrix which always has (given the variables $\small x_1,x_2,x_3 $) the form
$\qquad \small \begin{array} {r|lll} x_1: & 1 & . & . & \\ x_2: & a_1 & a_2 & . \\ x_3: & b_1 & b_2 & b_3 \\ \end{array} $
which can be continued to more rows/columns and where the dots mean (systematical) zeroes. The squares of the entries of one row sum up to 1 , and the correlations are the sum of the products of the entries along two rows, say for $\small corr(x_1,x_2)=1 \cdot a_1 $ or $\small corr(x_2,x_3)=a_1 \cdot b_1 + a_2 \cdot b_2 $
If we now want to know the possible range for the correlation $\small corr(x_2,x_3) $ given $\small corr(x_1,x_2)=a $ and $\small corr(x_1,x_3)=b $ then we know immediately that a,b must be the entries in the first column:
$\qquad \small \begin{array} {r|lll} x_1: & 1 & . & . & \\ x_2: & a & a_2 & . \\ x_3: & b & b_2 & b_3 \\ \end{array} $
and by the rule of sum-of-squares = 1 we get
$\qquad \small \begin{array} {r|lll} x_1^*: & 1 & . & . & \\ x_2^*: & a^2 & 1-a^2 & . \\ x_3^*: & b^2 & b_2^2 & 1-b^2-b_2^2 \\ \end{array} $
Here all except the entry $\small b_2$ are fixed or determined by the choice of $\small b_2$, which is also limited to the obvious interval $\small 0 \le b_2^2 \le 1-b^2$.
Let's for simpliness assume a and b are positive values. Then it is also obvious, that we get the possible range for the correlation $\small corr(x_2,x_3) $ if we set $\small x_2 $
to its maximum, that is $\small b_2^2 = 1-b^2, b_2=\sqrt{1-b^2} b_3=0$ $\qquad \small \begin{array} {r|lll} x_1: & 1 & . & . & \\ x_2: & a & \sqrt{1-a^2} & . \\ x_3: & b & \sqrt{1-b^2} & 0 \\ \end{array} $
and $\small corr(x_2,x_3)=a \cdot b + \sqrt{1-a^2}\cdot \sqrt{1-b^2} $
If a=b we have then $\small corr(x_2,x_3)=a^2 + (1-a^2) = 1 $
to some mean value, (which, when we allow only positive values for all entries
is also its minimum) that is $\small b_2^2 = 0, b_3^2=1-b^2,b_3=\sqrt{1-b^2}$ and
$\qquad \small \begin{array} {r|lll} x_1: & 1 & . & . & \\ x_2: & a & \sqrt{1-a^2} & . \\ x_3: & b & 0 & \sqrt{1-b^2} \\ \end{array} $
and $\small corr(x_2,x_3)=a \cdot b + 0 $
If a=b we have then $\small corr(x_2,x_3)=a^2 + 0 $
to its minimum (possibly negative, and then not minimal in its absolute value), that is $\small b_2^2 = 1-b^2, b_2=-\sqrt{1-b^2} ,\qquad b_3=0$
$\qquad \small \begin{array} {r|lll} x_1: & 1 & . & . & \\ x_2: & a & +\sqrt{1-a^2} & . \\ x_3: & b & - \sqrt{1-b^2} & 0 \\ \end{array} $
and $\small corr(x_2,x_3)=a \cdot b - \sqrt{1-a^2}\cdot \sqrt{1-b^2} < a\cdot b $
If a=b then we get $\small corr(x_2,x_3)=a \cdot a - \sqrt{1-a^2}\cdot \sqrt{1-a^2} = 2a^2-1 < a^2 $ which might also come out to be zero or even negative.
Completely similarly this can be done if more variables in the correlation-matrix are existent, because only the number of rows/columns in the cholesky-factor increases accordingly.
(Remark: for simpliness of the exposition of the principle of that calculations I did not attempt a more exact case-distinction)