Geometry – Distance of a Test Point from the Center of an Ellipsoid

geometrylinear algebramultivariable-calculusprobabilitystatistics

I'm trying to learn about Mahanalobis distance and I'm pretty close to getting the idea. I've learned that the distance has got a lot to do with the properties of an ellipsoid. I have understood so far that:

The Mahalanobis distance is simply the distance of the test point $\textbf{x}$ from the center of mass $\textbf{y}$ divided by the width of the ellipsoid in the direction of the test point and is given by the formula:

$$D(\textbf{x},\textbf{y})=\sqrt{ (\textbf{x}-\textbf{y})^TC^{-1}(\textbf{x}-\textbf{y})} $$

Now my question is: "Why does this formula give us the distance of a point $\textbf{x}$ from the center of mass $\textbf{y}$ divided by the width of the ellipsoid in the direction of the test point?" =)

I don't understand or see how this formula describes that distance, could someone help explaining this distance more? How is the plain distance of a point $\textbf{x}$ from the ellipsoid's center of mass $\textbf{y}$ in the direction of the test point found and why? =)

I hope my question is clear enough. My question could be analogous with for example "Why does $c^2 = a^2 + b^2$" Then you would prove this to me with a geometric proof or something =)

Thank you for any help =)

P.S. $C$ is the covariance matrix of vector $\textbf{x} = (x_1, …, x_n)$

Best Answer

I would like to introduce a geometric example to fix notation and then move to the Mahalonobis distance (quite informal) analysis.

  • Ellipsoid in $\mathbb R^{3}$: an easy example

Let us introduce the geometric definition of an ellipsoid centered at $y\in\mathbb R^{3}$, i.e. the locus of points $x\in\mathbb R^{3}$ satisfying

$$\langle(x-y),A^{-1}(x-y)\rangle=1.$$

Here $A$ is any positive definite matrix. For example, if $A=diag\{a^2,b^2,c^2\}$, $x=(x_1,x_2,x_3)$ and $y=(y_1,y_2,y_3)$, then the ellipsoid we are looking for is

$$\frac{(x_1-y_1)^2}{a^2}+ \frac{(x_2-y_2)^2}{b^2}+ \frac{(x_3-y_3)^2}{c^2}=1.$$

The parameter $a$ controls the distance between the center $y$ and the point $x$ along the 1st axis; similar considerations hold for $b$ and $c$. The ratios $\frac{a}{b}, \frac{b}{c}$ and $\frac{a}{c}$ determine the "shape" ("oblate", "prolate") of the given ellipsoid.

  • Ellipsoids and Mahalanobis distance in $\mathbb R^{n}$

The Mahalanobis distance

$$D(x,y)=\sqrt{\langle(x-y),C^{-1}(x-y)\rangle}$$

gives, by definition, the distance between the vectors $x$ and $y$ in $\mathbb R^{n}$ of realizations of a given multivariate random process $X=(X_1,\dots,X_n)$. $C$ is (an estimate of) the matrix of covariance. Looking at the formula for the geometric ellipsiod, you can identify the random vector $y$ in $D(x,y)$ with the center of the ellipsoid defined by $D(x,y)=d$, for some $d>0$. To be really precise, you should arrive at $D(x,y)=1$ by dividing the inverse $C^{-1}$ of the matrix of covariance by $d^2$.

  • On the second question

The sentence "The Mahalanobis distance is simply the distance of the test point from the center of mass divided by the width of the ellipsoid in the direction of the test point." can be understood with a simplified exposition.

Let us keep the above set up: $x$ is a random vector of realizations of the multivariate random process $X=(X_1,\dots,X_n)$. Let us introduce the vector $s=(s_1,\dots,s_n)$ of standardized variables

$$s_i:=\frac{x_i-\mu_i}{\sigma_i},$$

denoting by $\mu_i$ the mean of the realizations of the $i$-th random process $X_i$ and by $\sigma_i$ its variance. Let us consider the Mahalanobis distance of a vector $x$ from the center $\mu=(\mu_1,\dots,\mu_n)$, i.e.

$$D(x,\mu)=\sqrt{\langle (x-\mu),C^{-1}(x-\mu) \rangle }, $$

where the matrix of covariance is the diagonal matrix $C=\{\sigma^2_1,\dots,\sigma^2_n\}$. Then

$$D(x,\mu)=\sqrt{\langle s,s\rangle } ,$$

by construction. As in the geometric example above, the variance $\sigma_i$ in $C=\{\sigma^2_1,\dots,\sigma^2_n\}$ controls the "shape" (or width) of the ellipsoid defined by the Mahalanobis distance along the $i$th-axis.