Chi Square Test – Determining Null and Alternative Hypothesis for an Identity Matrix

correlationcorrelation matrixhypothesis testing

Say I had a correlation matrix:
$$
M =
\begin{bmatrix}
0.8 & 0.1 & 0.1\\
0.3 & 0.7 & 0.1 \\
-0.1 & -0.2 & 0.9
\end{bmatrix}
$$

I want to show that it is approximately an identity matrix $M = I$.

I was looking at the following R package: https://www.rdocumentation.org/packages/psych/versions/2.1.6/topics/cortest.mat

It uses a method derived from Steiger 1980, where he observed that

"the sum of the squared elements of a correlation matrix, or the Fisher z score equivalents, is distributed as chi square under the null hypothesis that the values are zero (i.e., elements of the identity matrix)"

I would translate this as:
$$
\sum^{N}_{i=1}\sum^{M}_{j=1}x_{i,j}^2 \sim \chi^{2}\; \mathrm{iff}\; \sum^{N}_{i=1}\sum^{M}_{j=1}x_{i,j}^2 = 0
$$

This doesn't make much sense to me. Using a simple identity matrix, I can easily show that this isn't true. I must be misunderstanding it.
E.g.
$$
I_{3} =
\begin{bmatrix}
1 & 0 & 0\\
0 & 1 & 0\\
0 & 0 & 1
\end{bmatrix}
\\
$$

$$
\sum^{N}_{i=1}\sum^{M}_{j=1}i_{i,j}^2 = 3
$$

When it comes to formulating the null and alternative hypothesis, the best I can get is:

H0: $\sum^{N}_{i=1}\sum^{M}_{j=1}x_{i,j}^2 = 0$

HA: $\sum^{N}_{i=1}\sum^{M}_{j=1}x_{i,j}^2 \neq 0$

Which somewhat makes sense. I want to prove that I have an identity matrix, and if it is true that summing the elements the identity matrix != 0, then this seems to check out. But the math I am coming here just feels inconsistent.

I ran the test on an identity matrix and found:

> cortest(diag(10), cor=FALSE)
Tests of correlation matrices 
Call:cortest(R1 = diag(10), cor = FALSE)
 Chi Square value Inf  with df =  100   with probability < 0 

Looks great, assuming p val is "with probability", then that's what I want to see!

Then I ran the test again on a uniformly distributed correlation matrix

> X <- matrix(runif(100,-1,1), ncol=10)
> cortest(X, cor=FALSE)
Tests of correlation matrices 
Call:cortest(R1 = X, cor = FALSE)
 Chi Square value 7139.26  with df =  100   with probability < 0 

this also has a very low p-value… maybe not so great afterall and I am not interpreting the results correctly

I checked out the original 1980 article on this, but it was too advanced for me to understand, and I cannot find any resources on this particular test online.

So my main question is:

  1. What is the null and alternative hypothesis?

Best Answer

According to the paper, the test is the following. Denote $m$ as the number of features, then define $k = (m^2 - m)/2$, the number of unique off-diagonal element in the correlation matrix. Furthermore, define vector $p$ (length $k$) which contains all of these elements. Then (Eq. 16 in the paper),

$$H_0: p = p_0$$

or in your case $H_0: p = 0$, and the alternative $H_1: p \neq 0$.

The test statistic does not sum the diagonal element (see Eq. 22 in paper), only the off-diagonal. The test statistic is $\sum \sum_{i < j} z_{i,j}^2$, where $z_{i,j}$ is the fisher transformation of the $i,j$ element in the correlation matrix. It is distributed as $\chi^2_{k-q}$, where $q$ is the number of common correlations (in your case 1, since they are all 0).

Related Question