Solved – standard deviation for multivariate data with correlations

multivariate analysisstandard deviation

In the same way one may use as a measure of dispersion the standad deviation for univariate data $\mu \pm z\sigma$ I would like to compute, if possible, its equivalent for multivariate data, but taking advantage of the potential correlations of the covariance matrix. Is there a way of computing a vector equivalent to $\sigma$ for the multivariate case that make use of the correlations?

Best Answer

there are two things you can do:

  • project your data onto one variable at a time and calculate the standard deviations. This is however not taking into account correlations between the different variables.

  • If you want to take into account the correlations, the covariance matrix contains this information. If you want to condense this information into a vector, you need to find a set of orthogonal coordinates which are uncorrelated in the dataset. This is done e.g. in Principal Component Analysis, with the difference that you would keep all components, not just the largest ones. In this new coordinate set, the covariance matrix is diagonal and thus the information can be contained in a vector.

In the end it depends on what you want to learn from the values of the $\sigma$'s. If you are interested in what range $ \sim 68\%$ of your data is contained (assuming the data projection follows a univariate Gaussian distribution) for a given variable, use the first procedure.

If you want to know (under the assumption that the data follow a multivariate Gaussian distribution) what region contains $\sim 0.68^d$ of the data ($d$ being the number of variables), the answer is the one sigma ellipsoid determined with the second method.