Solved – A measure of “variance” from the covariance matrix

covariancecovariance-matrixvariance

If the data is 1d, the variance shows the extent to which the data points are different from each other. If the data is multi-dimensional, we'll get a covariance matrix.

Is there a measure that gives a single number of how the data points are different from each other in general for multi-dimensional data?

I feel that there might be many solutions already, but I'm not sure the correct term to use to search for them.

Maybe I can do something like adding up the eigenvalues of the covariance matrix, does that sound sensible?

Best Answer

(The answer below merely introduces and states the theorem proven in Eq. (0) The beauty in that paper is that most of the arguments are made in terms of basic linear algebra. To answer this question it will be enough to state the main results, but by all means, go check the original source).

In any situation where the multivariate pattern of the data can be described by a $k$-variate elliptical distribution, statistical inference will, by definition, reduce it to the problem of fitting (and characterizing) a $k$-variate location vector (say $\boldsymbol\theta$) and a $k\times k$ symmetric semi-positive definite (SPSD) matrix (say $\boldsymbol\varSigma$) to the data. For reasons explained below (which are assumed as premises) it will often be more meaningful to decompose $\boldsymbol\varSigma$ into its shape component (a SPSD matrix of the same size as $\boldsymbol\varSigma$) accounting for the shape of the density contours of your multivariate distribution and a scalar $\sigma_S$ expressing the scale of these contours.

In univariate data ($k=1$), $\boldsymbol\varSigma$, the covariance matrix of your data is a scalar and, as will follow from the discussion below, the shape component of $\boldsymbol\varSigma$ is 1 so that $\boldsymbol\varSigma$ equals its scale component $\boldsymbol\varSigma=\sigma_S$ always and no ambiguity is possible.

In multivariate data, there are many possible choices for scaling functions $\sigma_S$. One in particular ($\sigma_S=|\pmb\varSigma|^{1/k}$) stands out in having a key desirable propriety, making it the preferred choice of scaling functions in the context of elliptical families.


Many problems in MV-statistics involve estimation of a scatter matrix, defined as a function(al) SPSD matrix in $\mathbb{R}^{k\times k}$ ($\boldsymbol\varSigma$) satisfying:

$$(0)\quad\boldsymbol\varSigma(\boldsymbol A\boldsymbol X+\boldsymbol b)=\boldsymbol A\boldsymbol\varSigma(\boldsymbol X)\boldsymbol A^\top$$ (for non singular matrices $\boldsymbol A$ and vectors $\boldsymbol b$). For example the classical estimate of covariance satisfies (0) but it is by no means the only one.

In the presence of elliptical distributed data, where all the density contours are ellipses defined by the same shape matrix, up to multiplication by a scalar, it is natural to consider normalized versions of $\boldsymbol\varSigma$ of the form:

$$\boldsymbol V_S = \boldsymbol\varSigma / S(\boldsymbol\varSigma)$$

where $S$ is a 1-honogenous function satisfying:

$$(1)\quad S(\lambda \boldsymbol\varSigma)=\lambda S(\boldsymbol\varSigma) $$

for all $\lambda>0$. Then, $\boldsymbol V_S$ is called the shape component of the scatter matrix (in short shape matrix) and $\sigma_S=S^{1/2}(\boldsymbol\varSigma)$ is called the scale component of the scatter matrix. Examples of multivariate estimation problems where the loss function only depends on $\boldsymbol\varSigma$ through its shape component $\boldsymbol V_S$ include tests of sphericity, PCA and CCA among others.

Of course, there are many possible scaling functions so this still leaves the open the question of which (if any) of several choices of normalization function $S$ are in some sense optimal. For example:

  • $S=\text{tr}(\boldsymbol\varSigma)/k$ (for example the one proposed by @amoeba in his comment below the OP's question as well as @HelloGoodbye's answer below. See also [1], [2], [3])
  • $S=|\boldsymbol\varSigma|^{1/k}$ ([4], [5], [6], [7], [8])
  • $\boldsymbol\varSigma_{11}$ (the first entry of the covariance matrix)
  • $\lambda_1(\boldsymbol\varSigma)$ (the first eigenvalue of $\boldsymbol\varSigma$), this is called the spectral norm and is discussed in @Aksakal answer below.

Among these, $S=|\boldsymbol\varSigma|^{1/k}$ is the only scaling function for which the Fisher Information matrix for the corresponding estimates of scale and shape, in locally asymptotically normal families, are block diagonal (that is the scale and shape components of the estimation problem are asymptotically orthogonal) [0]. This means, among other things, that the scale functional $S=|\boldsymbol\varSigma|^{1/k}$ is the only choice of $S$ for which the non specification of $\sigma_S$ does not cause any loss of efficiency when performing inference on $\boldsymbol V_S$.

I do not know of any comparably strong optimality characterization for any of the many possible choices of $S$ that satisfy (1).

  • [0] Paindaveine, D., A canonical definition of shape, Statistics & Probability Letters, Volume 78, Issue 14, 1 October 2008, Pages 2240-2247. Ungated link
  • [1] Dumbgen, L. (1998). On Tyler’s M-functional of scatter in high dimension, Ann. Inst. Statist. Math. 50, 471–491.
  • [2] Ollila, E., T.P. Hettmansperger, and H. Oja (2004). Affine equivariant multivariate sign methods. Preprint, University of Jyvaskyla.
  • [3] Tyler, D.E. (1983). Robustness and efficiency properties of scatter matrices, Biometrika 70, 411–420.
  • [4] Dumbgen, L., and D.E. Tyler (2005). On the breakdown properties of some multivariate M-Functionals, Scand. J. Statist. 32, 247–264.
  • [5] Hallin, M. and D. Paindaveine (2008). Optimal rank-based tests for homogeneity of scatter, Ann. Statist., to appear.
  • [6] Salibian-Barrera, M., S. Van Aelst, and G. Willems (200 6). Principal components analysis based on multivariate MM-estimators with fast and robust bootstrap, J. Amer. Statist. Assoc. 101, 1198–1211.
  • [7] Taskinen, S., C. Croux, A. Kankainen, E. Ollila, and H. O ja (2006). Influence functions and efficiencies of the canonical correlation and vector estimates based on scatter and shape matrices, J. Multivariate Anal. 97, 359–384.
  • [8] Tatsuoka, K.S., and D.E. Tyler (2000). On the uniqueness of S-Functionals and M-functionals under nonelliptical distributions, Ann. Statist. 28, 1219–1243.