Let's say my data consists of 25 observations of four features, which are in groups of 2. So we'll call my variables $x_1, x_2, y_1, y_2$. We have a partitioned sample mean vector given by $\begin{bmatrix}\bar{x}\\ \bar{y}\end{bmatrix}$, where $\bar{x} = \begin{bmatrix}\bar{x_1}\\ \bar{x_2}\end{bmatrix}$, $\bar{y} = \begin{bmatrix}\bar{y_1}\\ \bar{y_2}\end{bmatrix}$. We can partition the covariance matrix $S$ by
\begin{equation*}
S = \begin{bmatrix}S_{xx}&S_{yx}\\S_{xy}&S_{yy}\end{bmatrix}
\end{equation*}
where $S_{xx}$ is the covariance matrix of just the $x$ variables and $S_{yy}$ is the covariance matrix of just the $y$ variables. I'm trying to write an R script that will calculate $S_{yx}$ using just the input data. Clearly $S_{yx}$ consists of
\begin{equation*}
S_{yx} = \begin{bmatrix}Cov(x_1, y_1)&Cov(x_1, y_2)\\Cov(x_2, y_1)&Cov(x_2, y_2)\end{bmatrix}
\end{equation*}
So if $X$ is the $25 \times 2$ matrix of $x$ data and $Y$ is the $25 \times 2$ matrix of $Y$ data, then my idea for calculating is is like so:
\begin{equation*}
S_{yx} = \frac{1}{n-1} \big(X^TY – n\bar{x}\bar{y}^T\big)
\end{equation*}
However, when I type into R:
(1/(n-1)) * ((t(X) %*% Y) - (n * xbar %*% t(ybar)))
I get results which are wildly different from the corresponding results in the full covariance matrix. Is there a problem with my reasoning or my R code?
EDIT: ftp://ftp.wiley.com/public/sci_tech_med/multivariate_analysis_3e/ My input is the "T3_8_SONS.DAT" file in this directory. I don't know if there is a better way to include my input, sorry.
Best Answer
You can make your life a lot easier by using R's covariance function
cov
. So, all you need to do is read in the data to a dataframe, change the order of the columns, and then compute the covariance matrix. The partitions can then be plucked from the main covariance matrix. Here is some code to get $S$, $Sxx$, $Syx$.We could use your method, which is mathematically correct, to obtain $Syx$ too, but it's more work. Here are the alternate calculations:
Note that the $Syx$ here matches the computations I provided previously for $Syx$.