Solved – Calculating partitioned covariance matrices using R

covariance-matrixr

Let's say my data consists of 25 observations of four features, which are in groups of 2. So we'll call my variables $x_1, x_2, y_1, y_2$. We have a partitioned sample mean vector given by $\begin{bmatrix}\bar{x}\\ \bar{y}\end{bmatrix}$, where $\bar{x} = \begin{bmatrix}\bar{x_1}\\ \bar{x_2}\end{bmatrix}$, $\bar{y} = \begin{bmatrix}\bar{y_1}\\ \bar{y_2}\end{bmatrix}$. We can partition the covariance matrix $S$ by
\begin{equation*}
S = \begin{bmatrix}S_{xx}&S_{yx}\\S_{xy}&S_{yy}\end{bmatrix}
\end{equation*}
where $S_{xx}$ is the covariance matrix of just the $x$ variables and $S_{yy}$ is the covariance matrix of just the $y$ variables. I'm trying to write an R script that will calculate $S_{yx}$ using just the input data. Clearly $S_{yx}$ consists of
\begin{equation*}
S_{yx} = \begin{bmatrix}Cov(x_1, y_1)&Cov(x_1, y_2)\\Cov(x_2, y_1)&Cov(x_2, y_2)\end{bmatrix}
\end{equation*}
So if $X$ is the $25 \times 2$ matrix of $x$ data and $Y$ is the $25 \times 2$ matrix of $Y$ data, then my idea for calculating is is like so:
\begin{equation*}
S_{yx} = \frac{1}{n-1} \big(X^TY – n\bar{x}\bar{y}^T\big)
\end{equation*}
However, when I type into R:

(1/(n-1)) * ((t(X) %*% Y) - (n * xbar %*% t(ybar)))

I get results which are wildly different from the corresponding results in the full covariance matrix. Is there a problem with my reasoning or my R code?

EDIT: ftp://ftp.wiley.com/public/sci_tech_med/multivariate_analysis_3e/ My input is the "T3_8_SONS.DAT" file in this directory. I don't know if there is a better way to include my input, sorry.

Best Answer

You can make your life a lot easier by using R's covariance function cov. So, all you need to do is read in the data to a dataframe, change the order of the columns, and then compute the covariance matrix. The partitions can then be plucked from the main covariance matrix. Here is some code to get $S$, $Sxx$, $Syx$.

> #Read in Data
> mydf<-read.table("D:\\T3_8_SONS.DAT")
> names(mydf)<-c("y1", "y2", "x1", "x2")
> #Change column order so S is created in the same order as in OP's question
> mydf2<-data.frame(mydf$x1, mydf$x2, mydf$y1, mydf$y2)
> names(mydf2)<-c("x1", "x2", "y1", "y2")
> #Print to compare to Table 3.8 in book
> head(mydf2)
   x1  x2  y1  y2
1 179 145 191 155
2 201 152 195 149
3 185 149 181 148
4 188 149 183 153
5 171 142 176 144
6 192 152 208 157
> #Obtain full variance covariance matrix
> S <- cov(mydf2)
> #Obtain covariance partitioned matrices
> Sxx <- S[1:2,1:2]
> Syy <- S[3:4,3:4]
> Syx <- Sxy <- S[1:2, 3:4]
> S
          x1       x2       y1       y2
x1 100.80667 56.54000 69.66167 51.31167
x2  56.54000 45.02333 46.11167 35.05333
y1  69.66167 46.11167 95.29333 52.86833
y2  51.31167 35.05333 52.86833 54.36000
> Sxx
         x1       x2
x1 100.8067 56.54000
x2  56.5400 45.02333
> Syy
         y1       y2
y1 95.29333 52.86833
y2 52.86833 54.36000
> Syx
         y1       y2
x1 69.66167 51.31167
x2 46.11167 35.05333

We could use your method, which is mathematically correct, to obtain $Syx$ too, but it's more work. Here are the alternate calculations:

> #Your method
> n<-25
> xbar<-apply(mydf2, 2, mean)[1:2]
> ybar<-apply(mydf2, 2, mean)[3:4]
> Syx2<-(t(mydf2[1:2])%*%as.matrix(mydf2[3:4])-n*(xbar%*%t(ybar)))/(n-1)
> Syx2
         y1       y2
x1 69.66167 51.31167
x2 46.11167 35.05333
>

Note that the $Syx$ here matches the computations I provided previously for $Syx$.

Background

A covariance matrix $\mathbb{A}$ for a vector of random variables $X=(X_1, X_2, \ldots, X_n)^\prime$ embodies a procedure to compute the variance of any linear combination of those random variables. The rule is that for any vector of coefficients $\lambda = (\lambda_1, \ldots, \lambda_n)$,

$$\operatorname{Var}(\lambda X) = \lambda \mathbb{A} \lambda ^\prime.\tag{1}$$

In other words, the rules of matrix multiplication describe the rules of variances.

Two properties of $\mathbb{A}$ are immediate and obvious:

Because variances are expectations of squared values, they can never be negative. Thus, for all vectors $\lambda$, $$0 \le \operatorname{Var}(\lambda X) = \lambda \mathbb{A} \lambda ^\prime.$$ Covariance matrices must be non-negative-definite.
Variances are just numbers--or, if you read the matrix formulas literally, they are $1\times 1$ matrices. Thus, they do not change when you transpose them. Transposing $(1)$ gives $$\lambda \mathbb{A} \lambda ^\prime = \operatorname{Var}(\lambda X) = \operatorname{Var}(\lambda X) ^\prime = \left(\lambda \mathbb{A} \lambda ^\prime\right)^\prime = \lambda \mathbb{A}^\prime \lambda ^\prime.$$ Since this holds for all $\lambda$, $\mathbb{A}$ must equal its transpose $\mathbb{A}^\prime$: covariance matrices must be symmetric.

The deeper result is that any non-negative-definite symmetric matrix $\mathbb{A}$ is a covariance matrix. This means there actually is some vector-valued random variable $X$ with $\mathbb{A}$ as its covariance. We may demonstrate this by explicitly constructing $X$. One way is to notice that the (multivariate) density function $f(x_1,\ldots, x_n)$ with the property $$\log(f) \propto -\frac{1}{2} (x_1,\ldots,x_n)\mathbb{A}^{-1}(x_1,\ldots,x_n)^\prime$$ has $\mathbb{A}$ for its covariance. (Some delicacy is needed when $\mathbb{A}$ is not invertible--but that's just a technical detail.)

Solutions

Let $\mathbb{X}$ and $\mathbb{Y}$ be covariance matrices. Obviously they are square; and if their sum is to make any sense they must have the same dimensions. We need only check the two properties.

The sum.
- Symmetry $$(\mathbb{X}+\mathbb{Y})^\prime = \mathbb{X}^\prime + \mathbb{Y}^\prime = (\mathbb{X} + \mathbb{Y})$$ shows the sum is symmetric.
- Non-negative definiteness. Let $\lambda$ be any vector. Then $$\lambda(\mathbb{X}+\mathbb{Y})\lambda^\prime = \lambda \mathbb{X}\lambda^\prime + \lambda \mathbb{Y}\lambda^\prime \ge 0 + 0 = 0$$ proves the point using basic properties of matrix multiplication.
I leave this as an exercise.
This one is tricky. One method I use to think through challenging matrix problems is to do some calculations with $2\times 2$ matrices. There are some common, familiar covariance matrices of this size, such as $$\pmatrix{a & b \\ b & a}$$ with $a^2 \ge b^2$ and $a \ge 0$. The concern is that $\mathbb{XY}$ might not be definite: that is, could it produce a negative value when computing a variance? If it will, then we had better have some negative coefficients in the matrix. That suggests considering $$\mathbb{X} = \pmatrix{a & -1 \\ -1 & a}$$ for $a \ge 1$. To get something interesting, we might gravitate initially to matrices $\mathbb{Y}$ with different-looking structures. Diagonal matrices come to mind, such as $$\mathbb{Y} = \pmatrix{b & 0 \\ 0 & 1}$$ with $b\ge 0$. (Notice how we may freely pick some of the coefficients, such as $-1$ and $1$, because we can rescale all the entries in any covariance matrix without changing its fundamental properties. This simplifies the search for interesting examples.)

I leave it to you to compute $\mathbb{XY}$ and test whether it always is a covariance matrix for any allowable values of $a$ and $b$.

Solved – Raising a variance-covariance matrix to a negative half power

What the operation $C^{-\frac{1}{2}}$ refers at is the decorrelation of the underlying sample to uncorrelated components; $C^{-\frac{1}{2}}$ is used as whitening matrix. This is natural operation when looking to analyse each column/source of the original data matrix $A$ (having a covariance matrix $C$), through an uncorrelated matrix $Z$. The most common way of implementing such whitening is through the Cholesky decomposition (where we use $C = LL^T$, see this thread for an example with "colouring" a sample) but here we use slightly less uncommon Mahalanobis whitening (where we use $C= C^{0.5} C^{0.5}$). The whole operation in R would go a bit like this:

set.seed(323)
N <- 10000;
p <- 3;
# Define the real C
( C <- base::matrix( data =c(4,2,1,2,3,2,1,2,3), ncol = 3, byrow= TRUE) ) 
# Generate the uncorrelated data (ground truth)
Z <- base::matrix( ncol = 3, rnorm(N*p) ) 
# Estimate the colouring matrix C^0.5
CSqrt <- expm::sqrtm(C)
# "Colour" the data / usually we use Cholesky (LL^T) but using C^0.5 valid too
A <- t( CSqrt %*% t(Z) ) 
# Get the sample estimated C 
( CEst <- round( digits = 2, cov( A )) )
# Estimate the whitening matrix C^-0.5
CEstInv <-  expm::sqrtm(solve(CEst))
# Whiten the data
ZEst <-  t(CEstInv %*% t(A) )
# Check that indeed we have whitened the data 
( round( digits = 1, cov(cbind(ZEst, Z) ) ) )

So to succinctly answer the question raised:

It means that we can decorrelate the sample $A$ that is associated with that covariance matrix $C$ in such way that we get uncorrelated components. This is commonly referred as whitening.
The general Linear Algebra idea it assumes is that a (covariance) matrix can be used as a projection operator (to generate a correlated sample by "colouring") but so does the inverse of it (to decorrelate/"whiten" a sample).
Yes, the easiest way to raise a valid covariance matrix to any power (the negative square root is just a special case) by using the eigen-decomposition of it; $C = V \Lambda V^T$, $V$ being an orthonormal matrix holding the eigenvectors of $C$ and $\Lambda$ being a diagonal matrix holding the eigenvalues. Then we can readily change the diagonal matrix $\Lambda$ as we wish and get the relevant result.

A small code snippet showcasing point 3.

# Get the eigendecomposition of the covariance matrix
myEigDec <- eigen(cov(A))
# Use the eigendecomposition to get the inverse square root
myEigDec$vectors %*% diag( 1/ sqrt( myEigDec$values) ) %*% t(myEigDec$vectors)
# Use the eigendecomposition to get the "negative half power" (same as above)
myEigDec$vectors %*% diag( ( myEigDec$values)^(-0.5) ) %*% t(myEigDec$vectors)
# And to confirm by the R library expm
solve(expm::sqrtm(cov(A)))

Best Answer

Related Solutions

Covariance Matrix – Are Sum and Product of Two Covariance Matrices Also Covariance Matrices?

Background

Solutions

Solved – Raising a variance-covariance matrix to a negative half power

Related Question