I read somewhere that if $\Sigma$ is a covariance matrix, one can obtain the decomposition $\Sigma = \Sigma^{1/2}(\Sigma^{1/2})^T$ from Cholesky Decompositon. However, I am confused how because Cholesky decomposition requires an upper and lower triangular matrix. How can the square root here for upper and lower triangular, or can it?
[Math] If $\Sigma$ is a covariance matrix, how to obtain the decomposition $\Sigma = \Sigma^{1/2}(\Sigma^{1/2})^T$ from Cholesky Decompositon
linear algebramatricesmatrix decompositionstatistics
Related Solutions
They are different in the sense that you can't get from $A^T\Sigma' A$ to $A \Sigma' A^T$ (i.e. using the same matrix, $A$, in both forms) but they are the same in the sense that if $B = A^T$ then $A^T\Sigma' A = B \Sigma' B^T$.
In general, if $\Sigma$ is a symmetric matrix, then we can choose a matrix $A$ consisting of eigenvectors of $\Sigma$ (as columns) such that $A^{-1} = A^T$ (i.e. $AA^T = I$). Then to diagonalize $\Sigma$, we can write $\Sigma = A \Sigma' A^T$, where $\Sigma'$ is diagonal.
Alternatively, if we take $B = A^T$ then we have $B\Sigma B^T = \Sigma'$ (swapping the roles of $\Sigma$ and $\Sigma'$) and so $\Sigma = B^T \Sigma' B$ instead. Here the matrix $B$ has eigenvectors of $\Sigma$ as rows.
A classical way is to use the induction. Let $A\in\mathbb{R}^{n\times n}$ be positive definite. It is trivial for $n=1$, just take the square root. Assume that a Cholesky factorization exists for positive definite matrices of dimension $n-1$ and partition $A$ as $$ A = \begin{bmatrix} \tilde{A}&a\\a^T&\alpha \end{bmatrix}, $$ where $\tilde{A}\in\mathbb{R}^{(n-1)\times(n-1)}$. Since a principal submatrix of a positive definite matrix is positive definite, $\tilde{A}$ has a Cholesky factorization $\tilde{A}=\tilde{L}\tilde{L}^T$. Consider $$\tag{1} L_1^{-1}AL_1^{-T} := \begin{bmatrix} \tilde{L}^{-1}&0\\ 0&1 \end{bmatrix} \begin{bmatrix} \tilde{A}&a\\ a^T&\alpha \end{bmatrix} \begin{bmatrix} \tilde{L}^{-T}&0\\ 0&1 \end{bmatrix} = \begin{bmatrix} I&b\\ b^T&\alpha \end{bmatrix} =:B, \quad b:=\tilde{L}^{-1}a. $$ Next we eliminate $b$ by $$\tag{2} L_2^{-1}BL_2^{-T} := \begin{bmatrix} I&0\\-b^T&1 \end{bmatrix} \begin{bmatrix} I&b\\ b^T&\alpha \end{bmatrix} \begin{bmatrix} I&-b\\0&1 \end{bmatrix} = \begin{bmatrix} I&0\\0&\alpha-b^Tb \end{bmatrix} = \begin{bmatrix} I&0\\0&\alpha-a^TA^{-1}a \end{bmatrix}. $$ The diagonal matrix on the right-hand side of (2) is a result of congruence transformations applied to $A$, so the right-hand side of (2) is positive definite and $0<\alpha-a^TA^{-1}a=\lambda^2$ for some real $\lambda$. Set $$ L_3:=\begin{bmatrix}I&0\\0&\lambda\end{bmatrix} $$ so $L_2^{-1}BL_2^{-T}=L_3L_3^T$. From (1) we have $$ L_2^{-1}L_1^{-1}AL_1^{-T}L_2^{-T}=L_3L_3^T, $$ so $$ A=LL^T, \quad L:=L_1L_2L_3 = \begin{bmatrix} \tilde{L}&0\\ 0&1 \end{bmatrix} \begin{bmatrix} I&0\\b^T&1 \end{bmatrix} \begin{bmatrix} I&0\\ 0&\lambda \end{bmatrix} = \begin{bmatrix} \tilde{L}&0\\ b^T&\lambda \end{bmatrix} = \begin{bmatrix} \tilde{L}&0\\ a^T\tilde{L}^{-T}&\lambda \end{bmatrix} $$ is a Cholesky factorization of $A$.
There is a source of non-uniqueness of the factorization in the choice of the sign of $\lambda$. As soon as one requires the signs of the diagonal terms of the Cholesky factors to be fixed (e.g., positive), the factorization is unique.
A simple way to confirm this can be made as follows. Assume $$ A=LL^T=MM^T $$ are two Cholesky factors of $A$. This gives $$\tag{3} I=L^{-1}MM^TL^{-T}=(L^{-1}M)(L^{-1}M)^T $$ and $$\tag{4} (L^{-1}M)=(L^{-1}M)^{-T}. $$ The left-hand and right-hand sides of (4) are, respectively, lower and upper triangular matrices which means that $D:=L^{-1}M$ is both lower and upper triangular and hence a diagonal matrix. From (3) we have $I=D^2$ so $D$ is a diagonal matrix with $\pm 1$ diagonal entries and $M=LD$ meaning that two Cholesky factors of $A$ differ by the signs of their columns.
Best Answer
Since $\Sigma^{-1}$ is a positive definite matrix, there is a lower triangular, nonsingular matrix $C$ such that
$$C^T\Sigma^{-1}C = I$$
Hence, $$I = I^{-1} = (C^T\Sigma^{-1}C)^{-1} = C^{-1} \Sigma (C^T)^{-1} \\ \implies CC^T = CIC^T = CC^{-1} \Sigma (C^T)^{-1}C^T = \Sigma.$$
For example, for the $2 \times 2$ case,
$$\Sigma = \pmatrix{\sigma_1^2 & \rho\sigma_1 \sigma_2 \\ \rho \sigma_1 \sigma_2 & \sigma_2^2} \\ \Sigma^{-1} = \frac{1}{1 - \rho^2}\pmatrix{\sigma_1^{-2} & -\rho\sigma_1^{-1} \sigma_2^{-1} \\ -\rho \sigma_1^{-1} \sigma_2^{-1} & \sigma_2^{-2}}.$$
$$C = \pmatrix{ \sigma_1 & 0 \\ \rho \sigma_2 & \sigma_2\sqrt{1-\rho^2}} .$$