Solved – Is a sample covariance matrix always symmetric and positive definite

covariancesampling

When computing the covariance matrix of a sample, is one then guaranteed to get a symmetric and positive-definite matrix?

Currently my problem has a sample of 4600 observation vectors and 24 dimensions.

Best Answer

For a sample of vectors $x_i=(x_{i1},\dots,x_{ik})^\top$, with $i=1,\dots,n$, the sample mean vector is $$ \bar{x}=\frac{1}{n} \sum_{i=1}^n x_i \, , $$ and the sample covariance matrix is $$ Q = \frac{1}{n} \sum_{i=1}^n (x_i-\bar{x})(x_i-\bar{x})^\top \, . $$ For a nonzero vector $y\in\mathbb{R}^k$, we have $$ y^\top Qy = y^\top\left(\frac{1}{n} \sum_{i=1}^n (x_i-\bar{x})(x_i-\bar{x})^\top\right) y $$ $$ = \frac{1}{n} \sum_{i=1}^n y^\top (x_i-\bar{x})(x_i-\bar{x})^\top y $$ $$ = \frac{1}{n} \sum_{i=1}^n \left( (x_i-\bar{x})^\top y \right)^2 \geq 0 \, . \quad (*) $$ Therefore, $Q$ is always positive semi-definite.

The additional condition for $Q$ to be positive definite was given in whuber's comment bellow. It goes as follows.

Define $z_i=(x_i-\bar{x})$, for $i=1,\dots,n$. For any nonzero $y\in\mathbb{R}^k$, $(*)$ is zero if and only if $z_i^\top y=0$, for each $i=1,\dots,n$. Suppose the set $\{z_1,\dots,z_n\}$ spans $\mathbb{R}^k$. Then, there are real numbers $\alpha_1,\dots,\alpha_n$ such that $y=\alpha_1 z_1 +\dots+\alpha_n z_n$. But then we have $y^\top y=\alpha_1 z_1^\top y + \dots +\alpha_n z_n^\top y=0$, yielding that $y=0$, a contradiction. Hence, if the $z_i$'s span $\mathbb{R}^k$, then $Q$ is positive definite. This condition is equivalent to $\mathrm{rank} [z_1 \dots z_n] = k$.