Solved – Sufficient and necessary conditions for zero eigenvalue of a correlation matrix

correlationcovariance-matrixlinear algebra

Given $n$ random variable $X_i$, with probability distribution $P(X_1,\ldots,X_n)$, the correlation matrix $C_{ij}=E[X_i X_j]-E[X_i]E[X_j]$ is positive semi-definite, i.e. its eigenvalues are positive or zero.

I am interested in the conditions on $P$ that are necessary and/or sufficient for $C$ to have $m$ zero eigenvalues. For instance, a sufficient condition is that the random variables are not independent : $\sum_i u_i X_i=0$ for some real numbers $u_i$. For example, if $P(X_1,\ldots,X_n)=\delta(X_1-X_2)p(X_2,\ldots,X_n)$, then $\vec u=(1,-1,0,\ldots,0)$ is an eigenvector of $C$ with zero eigenvalue. If we have $m$ independent linear constraints on the $X_i$'s of this type, it would imply $m$ zero eigenvalues.

There is at least one additional (but trivial) possibility, when $X_a=E[X_a]$ for some $a$ (i.e. $P(X_1,\ldots,X_n)\propto\delta(X_a-E[X_a])$), since in that case $C_{ij}$ has a column and a line of zeros : $C_{ia}=C_{ai}=0,\,\forall i$. As it is not really interesting, I am assuming that the probability distribution is not of that form.

My question is : are linear constraints the only way to induce zero eigenvalues (if we forbid the trivial exception given above), or can non-linear constraints on the random variables also generate zero eigenvalues of $C$ ?

Best Answer

Perhaps by simplifying the notation we can bring out the essential ideas. It turns out we don't need involve expectations or complicated formulas, because everything is purely algebraic.


The algebraic nature of the mathematical objects

The question concerns relationships between (1) the covariance matrix of a finite set of random variables $X_1, \ldots, X_n$ and (2) linear relations among those variables, considered as vectors.

The vector space in question is the set of all finite-variance random variables (on any given probability space $(\Omega,\mathbb P)$) modulo the subspace of almost surely constant variables, denoted $\mathcal{L}^2(\Omega,\mathbb P)/\mathbb R.$ (That is, we consider two random variables $X$ and $Y$ to be the same vector when there is zero chance that $X-Y$ differs from its expectation.) We are dealing only with the finite-dimensional vector space $V$ generated by the $X_i,$ which is what makes this an algebraic problem rather than an analytic one.

What we need to know about variances

$V$ is more than just a vector space: it is a quadratic module, because it comes equipped with the variance. All we need to know about variances are two things:

  1. The variance is a scalar-valued function $Q$ with the property that $Q(aX)=a^2Q(X)$ for all vectors $X.$

  2. The variance is nondegenerate.

The second needs some explanation. $Q$ determines a "dot product," which is a symmetric bilinear form given by

$$X\cdot Y = \frac{1}{4}\left(Q(X+Y) - Q(X-Y)\right).$$

(This is of course nothing other than the covariance of the variables $X$ and $Y.$) Vectors $X$ and $Y$ are orthogonal when their dot product is $0.$ The orthogonal complement of any set of vectors $\mathcal A \subset V$ consists of all vectors orthogonal to every element of $\mathcal A,$ written

$$\mathcal{A}^0 = \{v\in V\mid a . v = 0\text{ for all }v \in V\}.$$

It is clearly a vector space. When $V^0 = \{0\}$, $Q$ is nondegenerate.

Allow me to prove that the variance is indeed nondegenerate, even though it might seem obvious. Suppose $X$ is a nonzero element of $V^0.$ This means $X\cdot Y = 0$ for all $Y\in V;$ equivalently,

$$Q(X+Y) = Q(X-Y)$$

for all vectors $Y.$ Taking $Y=X$ gives

$$4 Q(X) = Q(2X) = Q(X+X) = Q(X-X) = Q(0) = 0$$

and thus $Q(X)=0.$ However, we know (using Chebyshev's Inequality, perhaps) that the only random variables with zero variance are almost surely constant, which identifies them with the zero vector in $V,$ QED.

Interpreting the questions

Returning to the questions, in the preceding notation the covariance matrix of the random variables is just a regular array of all their dot products,

$$T = (X_i\cdot X_j).$$

There is a good way to think about $T$: it defines a linear transformation on $\mathbb{R}^n$ in the usual way, by sending any vector $x=(x_1, \ldots, x_n)\in\mathbb{R}^n$ into the vector $T(x)=y=(y_1, \ldots, x_n)$ whose $i^\text{th}$ component is given by the matrix multiplication rule

$$y_i = \sum_{j=1}^n (X_i\cdot X_j)x_j.$$

The kernel of this linear transformation is the subspace it sends to zero:

$$\operatorname{Ker}(T) = \{x\in \mathbb{R}^n\mid T(x)=0\}.$$

The foregoing equation implies that when $x\in \operatorname{Ker}(T),$ for every $i$

$$0 = y_i = \sum_{j=1}^n (X_i\cdot X_j)x_j = X_i \cdot \left(\sum_j x_j X_j\right).$$

Since this is true for every $i,$ it holds for all vectors spanned by the $X_i$: namely, $V$ itself. Consequently, when $x\in\operatorname{Ker}(T),$ the vector given by $\sum_j x_j X_j$ lies in $V^0.$ Because the variance is nondegenerate, this means $\sum_j x_j X_j = 0.$ That is, $x$ describes a linear dependency among the $n$ original random variables.

You can readily check that this chain of reasoning is reversible:

Linear dependencies among the $X_j$ as vectors are in one-to-one correspondence with elements of the kernel of $T.$

(Remember, this statement still considers the $X_j$ as defined up to a constant shift in location--that is, as elements of $\mathcal{L}^2(\Omega,\mathbb P)/\mathbb R$--rather than as just random variables.)

Finally, by definition, an eigenvalue of $T$ is any scalar $\lambda$ for which there exists a nonzero vector $x$ with $T(x) = \lambda x.$ When $\lambda=0$ is an eigenvalue, the space of associated eigenvectors is (obviously) the kernel of $T.$


Summary

We have arrived at the answer to the questions: the set of linear dependencies of the random variables, qua elements of $\mathcal{L}^2(\Omega,\mathbb P)/\mathbb R,$ corresponds one-to-one with the kernel of their covariance matrix $T.$ This is so because the variance is a nondegenerate quadratic form. The kernel also is the eigenspace associated with the zero eigenvalue (or just the zero subspace when there is no zero eigenvalue).


Reference

I have largely adopted the notation and some of the language of Chapter IV in

Jean-Pierre Serre, A Course In Arithmetic. Springer-Verlag 1973.