Solved – Calculating partitioned covariance matrices using R

covariance-matrixr

Let's say my data consists of 25 observations of four features, which are in groups of 2. So we'll call my variables $x_1, x_2, y_1, y_2$. We have a partitioned sample mean vector given by $\begin{bmatrix}\bar{x}\\ \bar{y}\end{bmatrix}$, where $\bar{x} = \begin{bmatrix}\bar{x_1}\\ \bar{x_2}\end{bmatrix}$, $\bar{y} = \begin{bmatrix}\bar{y_1}\\ \bar{y_2}\end{bmatrix}$. We can partition the covariance matrix $S$ by
\begin{equation*}
S = \begin{bmatrix}S_{xx}&S_{yx}\\S_{xy}&S_{yy}\end{bmatrix}
\end{equation*}

where $S_{xx}$ is the covariance matrix of just the $x$ variables and $S_{yy}$ is the covariance matrix of just the $y$ variables. I'm trying to write an R script that will calculate $S_{yx}$ using just the input data. Clearly $S_{yx}$ consists of
\begin{equation*}
S_{yx} = \begin{bmatrix}Cov(x_1, y_1)&Cov(x_1, y_2)\\Cov(x_2, y_1)&Cov(x_2, y_2)\end{bmatrix}
\end{equation*}

So if $X$ is the $25 \times 2$ matrix of $x$ data and $Y$ is the $25 \times 2$ matrix of $Y$ data, then my idea for calculating is is like so:
\begin{equation*}
S_{yx} = \frac{1}{n-1} \big(X^TY – n\bar{x}\bar{y}^T\big)
\end{equation*}

However, when I type into R:

(1/(n-1)) * ((t(X) %*% Y) - (n * xbar %*% t(ybar)))

I get results which are wildly different from the corresponding results in the full covariance matrix. Is there a problem with my reasoning or my R code?

EDIT: ftp://ftp.wiley.com/public/sci_tech_med/multivariate_analysis_3e/ My input is the "T3_8_SONS.DAT" file in this directory. I don't know if there is a better way to include my input, sorry.

Best Answer

You can make your life a lot easier by using R's covariance function cov. So, all you need to do is read in the data to a dataframe, change the order of the columns, and then compute the covariance matrix. The partitions can then be plucked from the main covariance matrix. Here is some code to get $S$, $Sxx$, $Syx$.

> #Read in Data
> mydf<-read.table("D:\\T3_8_SONS.DAT")
> names(mydf)<-c("y1", "y2", "x1", "x2")
> #Change column order so S is created in the same order as in OP's question
> mydf2<-data.frame(mydf$x1, mydf$x2, mydf$y1, mydf$y2)
> names(mydf2)<-c("x1", "x2", "y1", "y2")
> #Print to compare to Table 3.8 in book
> head(mydf2)
   x1  x2  y1  y2
1 179 145 191 155
2 201 152 195 149
3 185 149 181 148
4 188 149 183 153
5 171 142 176 144
6 192 152 208 157
> #Obtain full variance covariance matrix
> S <- cov(mydf2)
> #Obtain covariance partitioned matrices
> Sxx <- S[1:2,1:2]
> Syy <- S[3:4,3:4]
> Syx <- Sxy <- S[1:2, 3:4]
> S
          x1       x2       y1       y2
x1 100.80667 56.54000 69.66167 51.31167
x2  56.54000 45.02333 46.11167 35.05333
y1  69.66167 46.11167 95.29333 52.86833
y2  51.31167 35.05333 52.86833 54.36000
> Sxx
         x1       x2
x1 100.8067 56.54000
x2  56.5400 45.02333
> Syy
         y1       y2
y1 95.29333 52.86833
y2 52.86833 54.36000
> Syx
         y1       y2
x1 69.66167 51.31167
x2 46.11167 35.05333

We could use your method, which is mathematically correct, to obtain $Syx$ too, but it's more work. Here are the alternate calculations:

> #Your method
> n<-25
> xbar<-apply(mydf2, 2, mean)[1:2]
> ybar<-apply(mydf2, 2, mean)[3:4]
> Syx2<-(t(mydf2[1:2])%*%as.matrix(mydf2[3:4])-n*(xbar%*%t(ybar)))/(n-1)
> Syx2
         y1       y2
x1 69.66167 51.31167
x2 46.11167 35.05333
> 

Note that the $Syx$ here matches the computations I provided previously for $Syx$.

Related Question