Multicollinearity – Determinant Correlation Matrix Equals Zero

correlationdeterminantmulticollinearitymultiple regression

I'm studying Linear Models again, after finishing my degree some years ago. I found in my old notes that, according to my professor, one can check multicollinearity calculating the determinant of the sample correlation matrix of the variables $X_1,\ldots,X_p$. If it is close to 0, there is a problem of multicollinearity.

I suspected then that
$$\text{There is exact multicollinearity between } X_1,\ldots,X_p \iff det(Cor(X_1,\ldots,X_p))=0.
$$

However, I have not found any proof in the bibliography books, nor in Cross Validated. I have tried several things, but with no result yet. Could someone please help me found a proof? Is this result true?

Please note that, in the case of simple exact collinearity, there exists $X_i,X_j$ such that $Cor(X_i,X_j)=1$ which happens if and only if the correlation matrix has two equal rows, i.e., if its determinant is 0. I am not interested in this case. I am asuming that the correlation matrix has different rows, but its determinant is 0.

In an R example

set.seed(1)
x = runif(300,0,1)
y = 2*runif(300,0,1)
z = 1+3*x+2*y   #Exact multicollinearity
data_multicol = data.frame(X = x, Y = y, Z = z)

cor(data_multicol)
det(cor(data_multicol))

The outputs are

> cor(data_multicol)
           X          Y         Z
X 1.00000000 0.01221708 0.5847432
Y 0.01221708 1.00000000 0.8183018
Z 0.58474316 0.81830180 1.0000000

> det(cor(data_multicol))
[1] 2.220115e-16

Best Answer

The result is true, but I will only sketch out why below.

  • Consider that the covariance and correlation matrices are Gram matrices, $G$.

  • Recall that $\mathbb{E}[(X_i - \mathbb{E}[X_i]) (X_j - \mathbb{E}[X_j])] = \int (x_i - \mu_i)(x_j - \mu_j) f(x_i,x_j)dx_idx_j$ is an inner product.

  • The correlation is a normalized covariance.

  • A collection of instances of a random variable can be easily mapped to vectors because each instance is at a particular value $\omega$ in the outcome space $\Omega$.

  • The determinant of such a matrix, $\det G$, is called the Gram determinant or Gramiam.

  • It is known that $\det G = 0 \iff$ the set of vectors are linearly dependent.

  • Multicollinearity is defined as such a linear dependence.

Related Question