[Math] 3-sigma Ellipse, why axis length scales with square root of eigenvalues of covariance-matrix

covarianceeigenvalues-eigenvectorslinear algebranumerical linear algebra

This is my first post on math.stackexchange and i am not a mathematician, but i took some undergrad math courses and some grad mathematical modelling courses, so i come with a basic understanding of math and computed a lot of eigenvectors and eigenvalues already… I hope my notation can be understood easily.

I am trying to describe a two dimensional scatter plot by 3 parameter, which define the shape of an ellipse (i.e. length of major-axis, length of minor-axis and the angle between the major-axis and the ordinate of a cartesian coordinate system. The reason for this is, the scatter-plot changes over time and i want to track that change by tracking the change of those 3 parameters, i.e. the shape of the ellipse. I do not need a confidence-region or anything like this, just the parameters of an ellipse, fitting to my data.

All i have given (and need) is the x- and y-coordinates of the scatter plot. I understand that the eigenvectors of the covariance-matrix give me orientation of the major- and minor-axis, which gives me the angle between major-axis and ordinate (just as in a principal component analysis). What I do not understand is, how do I get the length of the major- and minor-axis from the covariance matrix. I found two different approaches:

  1. Similar post on math.stackexchange linking to this source code on Wikipedia. They do a Cholesky-Decomposition of the Covariance matrix and transform a unit-circle with it. I cannot follow this mathematically, nor do i know why this would be correct.
  2. Similar post on stackoverflow They define the width and the height (i.e. length of the major- and minor-axis) as 2 * nstd * np.sqrt(vals), where nstd is denotes the n-th standard deviation which. In my case this would be 3.

I can follow the second approach better, used the python code and the result looks great! But i do not understand why the length of the axis scales with the square root of the eigenvalues. I think that there must be a simple relation between the eigenvalue of a covariance matrix and the standard deviation of the projected data to the ellipse-axis, which i just cannot find in the literature and which i cannot derive myself. Can anyone enlighten me on this or help me to find something to read, such i can understand this better.

Best, Nras.

Update:
I got the Textbook "Circular Statistics in Biology" from Batchelet, 1981. On pp.268-274 they calculate the semi-axis and the inclination angle for an example with $\sigma_1=3$ and $\sigma_2=2$ and with the correlation coefficient $\rho=0.4$. This yields the covariance matrix $$\begin{align*} Cov & =\left[ {\begin{array}{cc}
\sigma_1^2 & \rho\sigma_1\sigma_2\\
\rho\sigma_1\sigma_2 & \sigma_2^2\\
\end{array} } \right] \\
&= \left[ {\begin{array}{cc}
9.0 & 2.4\\
2.4 & 4.0\\
\end{array} } \right]
\end{align*}$$
and go through the equations 14.4.8 to 14.4.11 and find the results $$a=3.157, b=1.742, \Theta=21.9^\circ$$ The result just magically appears, as i cannot find the referenced equations, they stop after 14.4.7, right before this example and the chapter finishes after the example. What a pity!

Running the Code in python, where the semi-axis length is defined by the square root of the eigenvalue, i obtain the same result:

import numpy
cov = numpy.array([[9., 2.4], [2.4, 4.]])
eigenvals, eigenvecs = numpy.linalg.eig(cov)
numpy.sqrt(eigenvals)
numpy.rad2deg(numpy.arccos(eigenvecs[0, 0]))

The output is as follows:

array([ 3.1568251,  1.7419688])
21.915430336046288

So there must be something that explains why the ellipse semi-axis length scales with the square root of the eigenvalue. I now know that it is true, but still do not know why.

Update2:
The referenced equations really do not exist, but the calculation is sketched in the previous chapter 13.8. Summary: Let the Distributions 1 and 2 be centered at $\mu_1$ and $\mu_2$, respectively and and let
$$ A=\sigma_2^2,\quad B=-\rho\sigma_1\sigma_2,\quad C=\sigma_1^2,\quad D=(1-\rho^2)\sigma_1^2\sigma_2^2$$ then the Ellipse follows the equation $$ A(x-\mu_1)^2 + 2B(x-\mu_1)(y-\mu_2) + C(y-\mu_2)^2 = D$$
From that equation the semi-axis length and inclination of the major-axis is computed as follows: Let $$R=[(A-C)^2 + 4B^2]^{1/2}$$ then the semi-major-axis $a$ and the semi-minor-axis $b$ can be computed as $$b, a = \left(\dfrac{2D}{A+C\pm R}\right)^{1/2}$$ and for the inclination $$\Theta = \arctan\left(\dfrac{2B}{A-C-R}\right)$$
Substituting in the values from the example yields the exact same solution as the approach with the square root of the eigenvalues and the inclination which can be obtained from the arccos of the eigenvector.

From here it cannot be fare to show that $a=\sqrt{\lambda_1}$ and $b=\sqrt{\lambda_2}$ if $\lambda_i$ are the eigenvalues of the covariance matrix.

Best Answer

If anyone is still interested in this problem, in my spare time i think i found an answer. The answer is: the two methods only yield the same lengths for the semi-minor and semi-major-axis of the ellipse for the theoretical case that the covariance matrix is exactly equal to the expectation value of the covariance matrix for a bivariate gaussian distribution. $$Cov = \left[{\begin{array}{cc} \sigma_1^2 & \rho\sigma_1\sigma_2\\ \rho\sigma_1\sigma_2 & \sigma_2^2\\ \end{array} }\right]$$ If the covariance-matrix comes from a numerical realization of a bivariate gaussian distribution, then the covariance matrix will not exactly look like stated above and the results of the "eigenvalue"-method and the "Batschelet, 1981"-Method begin to deviate. The result can be seen in this python code below

import numpy
import scipy

# example 14.4.1 (p. 273), parameter for bivariate gaussian distribution
mu_1 = 5
mu_2 = 4
sigma_1 = 3.0
sigma_2 = 2.0
rho = 0.4   # correlation

# eq 14.4.3 (p. 269), expectation value for covariance-matrix for bivariate gaussian distribution!
cov_theo = numpy.array([[sigma_1**2, rho*sigma_1*sigma_2],
                   [rho*sigma_1*sigma_2, sigma_2**2]])

# make an example (with 10000 points) as sketched in Batschelet, 1981
points = numpy.random.multivariate_normal(mean=(mu_1,mu_2), cov=cov_theo, size=10000)
x, y = points.T
cov = numpy.cov(x, y)   # covariance-matrix by the realization

# get estimations for mu, sigma, rho
mu_1 = numpy.mean(x)
mu_2 = numpy.mean(y)
sigma_1 = numpy.std(x)
sigma_2 = numpy.std(y)
rho = scipy.corrcoef(x,y)[0,1]

## -- BATSCHELET-METHOD (1981) -- ##
# eq 14.4. (ellipse equation): A(x-mu_1)^2 + 2B(x-mu_1)(y-mu_2) + C(y-mu_2)^2 = D
A = sigma_2**2
B = -rho*sigma_1*sigma_2
C = sigma_1**2
D = (1-rho**2)*sigma_1**2*sigma_2**2

# further calculations (p. 264)
R = numpy.sqrt((A-C)**2 + 4*B**2)
a = numpy.sqrt(2*D/(A+C-R))
b = numpy.sqrt(2*D/(A+C+R))

## -- EIGENVALUE-METHOD from SO -- ##
eigval, _ = numpy.linalg.eig(cov)
a_eig, b_eig = numpy.sqrt(eigval)

print 'realization of covariance_matrix\n', cov
print 'deviations from what it should look like:'
print 'sigma_1^2 - cov[0,0] = {d1}'.format(d1=sigma_1**2-cov[0,0])
print 'sigma_2^2 - cov[1,1] = {d2}'.format(d2=sigma_2**2-cov[1,1])
print 'rho*sigma_1*sigma_2 - cov[0,1] = {d}'.format(d=rho*sigma_1*sigma_2 - cov[0,1])
print 'a-a_eig = {diff_a}, b-b_eig = {diff_b}'.format(diff_a=a-a_eig, diff_b=b-b_eig)

The result looks like:

realization of covariance_matrix
[[ 9.00812576  2.4676689 ]
 [ 2.4676689   4.14435948]]
deviations from what it should look like:
sigma_1^2 - cov[0,0] = -0.00090081257572
sigma_2^2 - cov[1,1] = -0.00041443594839
rho*sigma_1*sigma_2 - cov[0,1] = -0.000246766890107
a-a_eig = -0.000158440395525, b-b_eig = -8.82014644565e-05

and it gets worse for smaller sample sizes.

In Summary:

  1. The methods only give the same results, if the covariance matrix is exactly as assumed.
  2. The sample points must be drawn from a bivariate gaussian distribution. If the points are not taken from such a distribution, it gets even worse.
  3. I am still unable to show, that $a=\sqrt{\lambda_1}\quad,\quad b=\sqrt{\lambda_2}$ iff the covariance-matrix is exactly as assumed.
Related Question