I guess the answer should be yes, but I still feel something is not right. There should be some general results in the literature, could anyone help me?
Covariance Matrix – Is Every Covariance Matrix Positive Definite?
covariancecovariance-matrixlinear algebramatrix
Related Solutions
A correlation matrix is really the covariance matrix of a bunch of variables which have been rescaled to have variance one.
But every population covariance matrix is positive semi-definite, and if we rule out weird cases (such as with missing data, or "numerical fuzz" turning a small eigenvalue to a negative one), so is every sample covariance matrix.
So if a matrix is supposed to be a correlation matrix, it should be positive semi-definite.
Note that the semi-definite is important here. In the bivariate case, take your two variables to be perfectly positively correlated and then the correlation matrix is $\pmatrix{1 & 1 \\ 1& 1}$ which has eigenvalues of $2$ and $0$: the zero eigenvalue means it is not positive definite.
(The answer below merely introduces and states the theorem proven in Eq. (0) The beauty in that paper is that most of the arguments are made in terms of basic linear algebra. To answer this question it will be enough to state the main results, but by all means, go check the original source).
In any situation where the multivariate pattern of the data can be described by a $k$-variate elliptical distribution, statistical inference will, by definition, reduce it to the problem of fitting (and characterizing) a $k$-variate location vector (say $\boldsymbol\theta$) and a $k\times k$ symmetric semi-positive definite (SPSD) matrix (say $\boldsymbol\varSigma$) to the data. For reasons explained below (which are assumed as premises) it will often be more meaningful to decompose $\boldsymbol\varSigma$ into its shape component (a SPSD matrix of the same size as $\boldsymbol\varSigma$) accounting for the shape of the density contours of your multivariate distribution and a scalar $\sigma_S$ expressing the scale of these contours.
In univariate data ($k=1$), $\boldsymbol\varSigma$, the covariance matrix of your data is a scalar and, as will follow from the discussion below, the shape component of $\boldsymbol\varSigma$ is 1 so that $\boldsymbol\varSigma$ equals its scale component $\boldsymbol\varSigma=\sigma_S$ always and no ambiguity is possible.
In multivariate data, there are many possible choices for scaling functions $\sigma_S$. One in particular ($\sigma_S=|\pmb\varSigma|^{1/k}$) stands out in having a key desirable propriety, making it the preferred choice of scaling functions in the context of elliptical families.
Many problems in MV-statistics involve estimation of a scatter matrix, defined as a function(al) SPSD matrix in $\mathbb{R}^{k\times k}$ ($\boldsymbol\varSigma$) satisfying:
$$(0)\quad\boldsymbol\varSigma(\boldsymbol A\boldsymbol X+\boldsymbol b)=\boldsymbol A\boldsymbol\varSigma(\boldsymbol X)\boldsymbol A^\top$$ (for non singular matrices $\boldsymbol A$ and vectors $\boldsymbol b$). For example the classical estimate of covariance satisfies (0) but it is by no means the only one.
In the presence of elliptical distributed data, where all the density contours are ellipses defined by the same shape matrix, up to multiplication by a scalar, it is natural to consider normalized versions of $\boldsymbol\varSigma$ of the form:
$$\boldsymbol V_S = \boldsymbol\varSigma / S(\boldsymbol\varSigma)$$
where $S$ is a 1-honogenous function satisfying:
$$(1)\quad S(\lambda \boldsymbol\varSigma)=\lambda S(\boldsymbol\varSigma) $$
for all $\lambda>0$. Then, $\boldsymbol V_S$ is called the shape component of the scatter matrix (in short shape matrix) and $\sigma_S=S^{1/2}(\boldsymbol\varSigma)$ is called the scale component of the scatter matrix. Examples of multivariate estimation problems where the loss function only depends on $\boldsymbol\varSigma$ through its shape component $\boldsymbol V_S$ include tests of sphericity, PCA and CCA among others.
Of course, there are many possible scaling functions so this still leaves the open the question of which (if any) of several choices of normalization function $S$ are in some sense optimal. For example:
- $S=\text{tr}(\boldsymbol\varSigma)/k$ (for example the one proposed by @amoeba in his comment below the OP's question as well as @HelloGoodbye's answer below. See also [1], [2], [3])
- $S=|\boldsymbol\varSigma|^{1/k}$ ([4], [5], [6], [7], [8])
- $\boldsymbol\varSigma_{11}$ (the first entry of the covariance matrix)
- $\lambda_1(\boldsymbol\varSigma)$ (the first eigenvalue of $\boldsymbol\varSigma$), this is called the spectral norm and is discussed in @Aksakal answer below.
Among these, $S=|\boldsymbol\varSigma|^{1/k}$ is the only scaling function for which the Fisher Information matrix for the corresponding estimates of scale and shape, in locally asymptotically normal families, are block diagonal (that is the scale and shape components of the estimation problem are asymptotically orthogonal) [0]. This means, among other things, that the scale functional $S=|\boldsymbol\varSigma|^{1/k}$ is the only choice of $S$ for which the non specification of $\sigma_S$ does not cause any loss of efficiency when performing inference on $\boldsymbol V_S$.
I do not know of any comparably strong optimality characterization for any of the many possible choices of $S$ that satisfy (1).
- [0] Paindaveine, D., A canonical definition of shape, Statistics & Probability Letters, Volume 78, Issue 14, 1 October 2008, Pages 2240-2247. Ungated link
- [1] Dumbgen, L. (1998). On Tyler’s M-functional of scatter in high dimension, Ann. Inst. Statist. Math. 50, 471–491.
- [2] Ollila, E., T.P. Hettmansperger, and H. Oja (2004). Affine equivariant multivariate sign methods. Preprint, University of Jyvaskyla.
- [3] Tyler, D.E. (1983). Robustness and efficiency properties of scatter matrices, Biometrika 70, 411–420.
- [4] Dumbgen, L., and D.E. Tyler (2005). On the breakdown properties of some multivariate M-Functionals, Scand. J. Statist. 32, 247–264.
- [5] Hallin, M. and D. Paindaveine (2008). Optimal rank-based tests for homogeneity of scatter, Ann. Statist., to appear.
- [6] Salibian-Barrera, M., S. Van Aelst, and G. Willems (200 6). Principal components analysis based on multivariate MM-estimators with fast and robust bootstrap, J. Amer. Statist. Assoc. 101, 1198–1211.
- [7] Taskinen, S., C. Croux, A. Kankainen, E. Ollila, and H. O ja (2006). Influence functions and efficiencies of the canonical correlation and vector estimates based on scatter and shape matrices, J. Multivariate Anal. 97, 359–384.
- [8] Tatsuoka, K.S., and D.E. Tyler (2000). On the uniqueness of S-Functionals and M-functionals under nonelliptical distributions, Ann. Statist. 28, 1219–1243.
Best Answer
No.
Consider three variables, $X$, $Y$ and $Z = X+Y$. Their covariance matrix, $M$, is not positive definite, since there's a vector $z$ ($= (1, 1, -1)'$) for which $z'Mz$ is not positive.
Population covariance matrices are positive semi-definite.
(See property 2 here.)
The same should generally apply to covariance matrices of complete samples (no missing values), since they can also be seen as a form of discrete population covariance.
However due to inexactness of floating point numerical computations, even algebraically positive definite cases might occasionally be computed to not be even positive semi-definite; good choice of algorithms can help with this.
More generally, sample covariance matrices - depending on how they deal with missing values in some variables - may or may not be positive semi-definite, even in theory. If pairwise deletion is used, for example, then there's no guarantee of positive semi-definiteness. Further, accumulated numerical error can cause sample covariance matrices that should be notionally positive semi-definite to fail to be.
Like so:
This happened on the first example I tried (I probably should supply a seed but it's not so rare that you should have to try a lot of examples before you get one).
The result came out negative, even though it should be algebraically zero. A different set of numbers might yield a positive number or an "exact" zero.
--
Example of moderate missingness leading to loss of positive semidefiniteness via pairwise deletion: