Solved – Covariance Matrix vs. Pairwise Covariance Matrix

covariancematrixr

I found this equation here to calculate a covariance matrix of any number of variables using matrix algebra.
$$\frac1{N} (X – 1\bar{x})^T(X – 1\bar{x}^T)
$$

For a given matrix $X$ with $N$ samples. The following is SAS code I have found in the link above.

ONES = J(N, 1, 1);
meanvec = (1/N)*t(X)*ONES;
mean_matrix = ONES*t(meanvec);
cov_matrix = (1/n) * t(X- mean_matrix) * (x - mean_matrix);

However, I don't have SAS on my workstation so I converted this to R which is nearly identical.

ONES <- matrix(1, nrow=N, ncol=1)
meanvec <- (1/N) * t(X) %*% ONES
mean_matrix <- ONES %*% t(meanvec)
cov_matrix <- (1/N) * t(X - mean_matrix) %*% (X - mean_matrix)

Now, here is where I run in to problems. Let's take this sample matrix $X$

X
     [,1] [,2] [,3]
[1,]   90   60   90
[2,]   90   90   30
[3,]   60   60   60
[4,]   60   60   90
[5,]   30   30   30

If I run the above code I get the following covariance matrix.

cov_matrix
     [,1] [,2] [,3]
[1,]  504  360  180
[2,]  360  360    0
[3,]  180    0  720

But when I run the cov function from the stats package I get

cov(X)
     [,1] [,2] [,3]
[1,]  630  450  225
[2,]  450  450    0
[3,]  225    0  900

which are the pairwise covariances between columns (verified by cov(X[,1], X[,1]). Sorry if I am missing some basic math concept here but what is the difference here? Why would I see 'returns a covariance' matrix from two things that return different 'kinds' of covariance matrices?

This is strictly a learning concept for me so I would appreciate any further information you could provide to help me understand these differences.

Best Answer

Answered in comments:

A covariance matrix is just a matrix of pairwise covariances, so I'm not sure about the distinction you're making.

ā€“ dsaxton

Use $Nāˆ’1$ in place of $N$ to obtain the so-called "unbiased" version

ā€“ rvl

See (1) the help page for cov; (2) How exactly did statisticians agree to using (n-1) as the unbiased estimator for population variance without simulation?; and (3) Intuitive explanation for dividing by $n-1$ when calculating standard deviation? for intuition. For yet more information search standard deviation correction.

ā€“ whuber

Related Question