[Math] Bounds on off-diagonal entries of a correlation matrix

correlationmatricesprobability

Assume that all the entries of an $n \times n$ correlation matrix which are not on the main
diagonal are equal to $q$. Find upper and lower bounds on the possible values of $q$.

I know that the matrix should be positive semidefinite but how to proceed to get the upper and lower bounds?

Thanks!

Best Answer

Consider $n$ unit-variance random variables $X_1, X_2, \ldots X_n$ with the property that $\operatorname{cov}(X_i,X_j) = q$ for all $i \neq j$. Then, the covariance matrix of these random variables is the same as the correlation matrix. Now $$\begin{align*} \operatorname{var}(X_1+X_2+\cdots+X_n) &= \sum_{i=1}^n \operatorname{var}(X_i) + 2\sum_{i=1}^n\sum_{j=i+1}^n\operatorname{cov}(X_i,X_j)\tag{1}\\ &= n + n(n-1)q\\ &\geq 0 \end{align*}$$ and so it must be that $$q \geq -\frac{1}{n-1}$$ as Michael Hardy noted in a succinct comment on the question. The upper bound is, of course, $q \leq 1$. Both bounds are achievable. Obviously, if all the $X_i$ are the same random variable $X$, then $q = 1$. For the lower bound, suppose that the $X_i$ are independent unit-variance random variables so that they enjoy the desired constant correlation with $q=0$. For each $i$, set $Y_i = X_i-\bar{X}$ where $$\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i.$$ Then, $$\operatorname{var}(Y_i) = \left(\frac{n-1}{n}\right)^2 + (n-1)\left(\frac{1}{n}\right)^2 = \frac{n-1}{n}$$ while for $i \neq j$, $$\begin{align} \operatorname{cov}(Y_i,Y_j) &= \operatorname{cov}(X_i - \bar{X}, Y_j- \bar{X})\\ &= \operatorname{cov}(X_i,X_j) - \operatorname{cov}(X_i,\bar{X}) - \operatorname{cov}(X_j,\bar{X})+ \operatorname{var}(\bar{X})\\ &= 0 - \frac{1}{n} - \frac{1}{n} + \frac{1}{n}\\ &= -\frac{1}{n} \end{align}$$ showing that all the correlation coefficients do indeed have the minimum value $$ \frac{-1/n}{\sqrt{(n-1)/n}\sqrt{(n-1)/n}} = -\frac{1}{n-1}.$$


Returning to $(1)$, note that if the correlation coefficients are not required to all have the same value, then from $(1)$, we get that the sum of the $n(n-1)$ correlations must be at least $-n$. Thus, the average of the $n(n-1)$ correlations is at least $-1/(n-1)$ and since at least one correlation must be as large as the average, we can assert that

In any collection of $n$ random variables $X_1, X_2, \ldots, X_n$ with finite variance, there must be at least one pair of random variables $(X_i,X_j)$ (with $i\neq j$) for which $$\operatorname{cov}(X_i,X_j) \geq -\frac{1}{n-1}$$

Related Question