Could someone come up with R code to plot an ellipse from the eigenvalues and the eigenvectors of the following matrix
$$
\mathbf{A} = \left(
\begin{array} {cc}
2.2 & 0.4\\
0.4 & 2.8
\end{array} \right)
$$
R Ellipse Plotting – How to Use Eigenvalues and Eigenvectors
geometrymatrixmatrix decompositionmultivariate analysisr
Related Solutions
All the summaries of $\mathbf X$ displayed in the question depend only on its second moments; or, equivalently, on the matrix $\mathbf{X^\prime X}$. Because we are thinking of $\mathbf X$ as a point cloud--each point is a row of $\mathbf X$--we may ask what simple operations on these points preserve the properties of $\mathbf{X^\prime X}$.
One is to left-multiply $\mathbf X$ by an $n\times n$ matrix $\mathbf U$, which would produce another $n\times 2$ matrix $\mathbf{UX}$. For this to work, it is essential that
$$\mathbf{X^\prime X} = \mathbf{(UX)^\prime UX} = \mathbf{X^\prime (U^\prime U) X}.$$
Equality is guaranteed when $\mathbf{U^\prime U}$ is the $n\times n$ identity matrix: that is, when $\mathbf{U}$ is orthogonal.
It is well known (and easy to demonstrate) that orthogonal matrices are products of Euclidean reflections and rotations (they form a reflection group in $\mathbb{R}^n$). By choosing rotations wisely, we can dramatically simplify $\mathbf{X}$. One idea is to focus on rotations that affect only two points in the cloud at a time. These are particularly simple, because we can visualize them.
Specifically, let $(x_i, y_i)$ and $(x_j, y_j)$ be two distinct nonzero points in the cloud, constituting rows $i$ and $j$ of $\mathbf{X}$. A rotation of the column space $\mathbb{R}^n$ affecting only these two points converts them to
$$\cases{(x_i^\prime, y_i^\prime) = (\cos(\theta)x_i + \sin(\theta)x_j, \cos(\theta)y_i + \sin(\theta)y_j) \\ (x_j^\prime, y_j^\prime) = (-\sin(\theta)x_i + \cos(\theta)x_j, -\sin(\theta)y_i + \cos(\theta)y_j).}$$
What this amounts to is drawing the vectors $(x_i, x_j)$ and $(y_i, y_j)$ in the plane and rotating them by the angle $\theta$. (Notice how the coordinates get mixed up here! The $x$'s go with each other and the $y$'s go together. Thus, the effect of this rotation in $\mathbb{R}^n$ will not usually look like a rotation of the vectors $(x_i, y_i)$ and $(x_j, y_j)$ as drawn in $\mathbb{R}^2$.)
By choosing the angle just right, we can zero out any one of these new components. To be concrete, let's choose $\theta$ so that
$$\cases{\cos(\theta) = \pm \frac{x_i}{\sqrt{x_i^2 + x_j^2}} \\ \sin(\theta) = \pm \frac{x_j}{\sqrt{x_i^2 + x_j^2}}}.$$
This makes $x_j^\prime=0$. Choose the sign to make $y_j^\prime \ge 0$. Let's call this operation, which changes points $i$ and $j$ in the cloud represented by $\mathbf X$, $\gamma(i,j)$.
Recursively applying $\gamma(1,2), \gamma(1,3), \ldots, \gamma(1,n)$ to $\mathbf{X}$ will cause the first column of $\mathbf{X}$ to be nonzero only in the first row. Geometrically, we will have moved all but one point in the cloud onto the $y$ axis. Now we may apply a single rotation, potentially involving coordinates $2, 3, \ldots, n$ in $\mathbb{R}^n$, to squeeze those $n-1$ points down to a single point. Equivalently, $X$ has been reduced to a block form
$$\mathbf{X} = \pmatrix{x_1^\prime & y_1^\prime \\ \mathbf{0} & \mathbf{z}},$$
with $\mathbf{0}$ and $\mathbf{z}$ both column vectors with $n-1$ coordinates, in such a way that
$$\mathbf{X^\prime X} = \pmatrix{\left(x_1^\prime\right)^2 & x_1^\prime y_1^\prime \\ x_1^\prime y_1^\prime & \left(y_1^\prime\right)^2 + ||\mathbf{z}||^2}.$$
This final rotation further reduces $\mathbf{X}$ to its upper triangular form
$$\mathbf{X} = \pmatrix{x_1^\prime & y_1^\prime \\ 0 & ||\mathbf{z}|| \\ 0 & 0 \\ \vdots & \vdots \\ 0 & 0}.$$
In effect, we can now understand $\mathbf{X}$ in terms of the much simpler $2\times 2$ matrix $\pmatrix{x_1^\prime & y_1^\prime \\ 0 & ||\mathbf{z}||}$ created by the last two nonzero points left standing.
To illustrate, I drew four iid points from a bivariate Normal distribution and rounded their values to
$$\mathbf{X} = \pmatrix{ 0.09 & 0.12 \\ -0.31 & -0.63 \\ 0.74 & -0.23 \\ -1.8 & -0.39}$$
This initial point cloud is shown at the left of the next figure using solid black dots, with colored arrows pointing from the origin to each dot (to help us visualize them as vectors).
The sequence of operations effected on these points by $\gamma(1,2), \gamma(1,3),$ and $\gamma(1,4)$ results in the clouds shown in the middle. At the very right, the three points lying along the $y$ axis have been coalesced into a single point, leaving a representation of the reduced form of $\mathbf X$. The length of the vertical red vector is $||\mathbf{z}||$; the other (blue) vector is $(x_1^\prime, y_1^\prime)$.
Notice the faint dotted shape drawn for reference in all five panels. It represents the last remaining flexibility in representing $\mathbf X$: as we rotate the first two rows, the last two vectors trace out this ellipse. Thus, the first vector traces out the path
$$\theta\ \to\ (\cos(\theta)x_1^\prime, \cos(\theta) y_1^\prime + \sin(\theta)||\mathbf{z}||)\tag{1}$$
while the second vector traces out the same path according to
$$\theta\ \to\ (-\sin(\theta)x_1^\prime, -\sin(\theta) y_1^\prime + \cos(\theta)||\mathbf{z}||).\tag{2}$$
We may avoid tedious algebra by noting that because this curve is the image of the set of points $\{(\cos(\theta), \sin(\theta))\,:\, 0 \le \theta\lt 2\pi\}$ under the linear transformation determined by
$$(1,0)\ \to\ (x_1^\prime, 0);\quad (0,1)\ \to\ (y_1^\prime, ||\mathbf{z}||),$$
it must be an ellipse. (Question 2 has now been fully answered.) Thus there will be four critical values of $\theta$ in the parameterization $(1)$, of which two correspond to the ends of the major axis and two correspond to the ends of the minor axis; and it immediately follows that simultaneously $(2)$ gives the ends of the minor axis and major axis, respectively. If we choose such a $\theta$, the corresponding points in the point cloud will be located at the ends of the principal axes, like this:
Because these are orthogonal and are directed along the axes of the ellipse, they correctly depict the principal axes: the PCA solution. That answers Question 1.
The analysis given here complements that of my answer at Bottom to top explanation of the Mahalanobis distance. There, by examining rotations and rescalings in $\mathbb{R}^2$, I explained how any point cloud in $p=2$ dimensions geometrically determines a natural coordinate system for $\mathbb{R}^2$. Here, I have shown how it geometrically determines an ellipse which is the image of a circle under a linear transformation. This ellipse is, of course, an isocontour of constant Mahalanobis distance.
Another thing accomplished by this analysis is to display an intimate connection between QR decomposition (of a rectangular matrix) and the Singular Value Decomposition, or SVD. The $\gamma(i,j)$ are known as Givens rotations. Their composition constitutes the orthogonal, or "$Q$", part of the QR decomposition. What remained--the reduced form of $\mathbf{X}$--is the upper triangular, or "$R$" part of the QR decomposition. At the same time, the rotation and rescalings (described as relabelings of the coordinates in the other post) constitute the $\mathbf{D}\cdot \mathbf{V}^\prime$ part of the SVD, $\mathbf{X} = \mathbf{U\, D\, V^\prime}$. The rows of $\mathbf{U}$, incidentally, form the point cloud displayed in the last figure of that post.
Finally, the analysis presented here generalizes in obvious ways to the cases $p\ne 2$: that is, when there are just one or more than two principal components.
Assume first the the parameters $\boldsymbol\mu$ and $\boldsymbol\Sigma$ are known. Just as $\frac{x-\mu}\sigma$ is standard normal and $\frac{(x-\mu)^2}{\sigma^2}$ chi-square with 1 degree of freedom in the univariate case, the quadratic form $(\mathbf{x}-\boldsymbol\mu)^T\boldsymbol\Sigma^{-1}(\mathbf{x}-\boldsymbol\mu)$ is chi-square with $p$ degrees of freedom when $\mathbf{x}$ is multivariate normal. Hence, this pivot $$ (\mathbf{x}-\boldsymbol\mu)^T\boldsymbol\Sigma^{-1}(\mathbf{x}-\boldsymbol\mu)\le \chi_{p,\alpha}^2 \tag{1} $$ with probability $(1-\alpha)$. A probability region for $\mathbf{x}$ is found by inverting (1) with respect to $\mathbf{x}$. For points at the boundary of this set, ${\mathbf{L}^{-1}}(\mathbf{x}-\boldsymbol{\mu})$ lies on a circle with radius $\sqrt{\chi^2_{p,\alpha}}$ where $\mathbf L$ is the cholesky factor of $\boldsymbol\Sigma$ (or some other square root) such that $$ \mathbf{L}^{-1}(\mathbf{x}-\boldsymbol{\mu})=\sqrt{\chi^2_{p,\alpha}} \left[ \begin{matrix} \cos(\theta)\\ \sin(\theta) \end{matrix} \right]. $$ Hence, the boundary of the set (an ellipse) is described by the parametric curve $$ \mathbf{x}(\theta)= \boldsymbol{\mu} + \sqrt{\chi^2_{p,\alpha}}\mathbf{L} \left[ \begin{matrix} \cos(\theta)\\ \sin(\theta) \end{matrix} \right], $$ for $0<\theta <2\pi$.
If the parameters are unknown and we we use $\bar{\mathbf{x}}$ to estimate $\boldsymbol\mu$, $\mathbf{x}-\bar{\mathbf{x}} \sim N_p(0,(1+1/n))\boldsymbol{\Sigma})$. Hence, $(1+1/n)^{-1}(\mathbf{x}-\bar{\mathbf{x}})^T\boldsymbol\Sigma^{-1}(\mathbf{x}-\bar{\mathbf{x}})$ is chi-square with $p$ degrees of freedom. Substituting $\boldsymbol\Sigma$ by its estimate $\hat{\boldsymbol\Sigma}=\frac1{n-1}\mathbf{X}^T \mathbf{X}$ the resulting pivot is instead Hotelling $T$-squared distributed with $p$ and $n-p$ degrees of freedom (analogous to the $F_{1,n-1}$ distributed squared $t$-statistic in the univariate case) such that $$ \Big(1+\frac1n\Big)^{-1}(\mathbf{x}-\bar{\mathbf{x}})^T\hat{\boldsymbol\Sigma}^{-1}(\mathbf{x}-\bar{\mathbf{x}}) \le T^2_{p,n-p,\alpha} \tag{2} $$ with probability $(1-\alpha)$. Because the Hotelling $T$-squared is just a rescaled $F$-distribution, the above quantile equals $\frac{p(n-1)}{n-p}F_{p,n-p,\alpha}$.
Inverting (2) with respect to $\mathbf{x}$ leads to a prediction region with boundary described by the parametric curve $$ \mathbf{x}(\theta)= \bar{\mathbf x} + \sqrt{\Big(1+\frac1n\Big)\frac{p(n-1)}{n-p}F_{p,n-p,\alpha}}\hat{\mathbf{L}} \left[ \begin{matrix} \cos(\theta)\\ \sin(\theta) \end{matrix} \right] $$ where $\hat{\mathbf L}$ is the cholesky factor of the sample variance matrix $\hat{\boldsymbol\Sigma}$.
Code computing this for the data in the original question:
pred.int.mvnorm <- function(x, alpha=.05) {
p <- ncol(x)
n <- nrow(x)
Sigmahat <- var(x)
xbar <- apply(x,2,mean)
xbar
theta <- seq(0, 2*pi, length=100)
polygon <- xbar +
sqrt(p*(n-1)/(n-p)*(1 + 1/n)*qf(alpha, p, n - p, lower.tail = FALSE))*
t(chol(Sigmahat)) %*%
rbind(cos(theta), sin(theta))
t(polygon)
}
x <- matrix(c(-0.9,2.4,-1.4,2.9,2.0,0.2,0.7,1.0,-0.5,-1.0),ncol=2)
plot(pred.int.mvnorm(x), type="l",xlab=expression(x[1]),ylab=expression(x[2]))
points(x)
More code testing the coverage
library(mvtnorm)
library(sp)
hits <- 0
for (i in 1:1e+5) {
x <- rmvnorm(6, sigma = diag(2))
pred.int <- pred.int.mvnorm(x[-1,])
x <- x[1,]
if (point.in.polygon(x[1], x[2], pred.int[,1], pred.int[,2])==1)
hits <- hits + 1
}
hits
[1] 94955
Best Answer
You could extract the eigenvectors and -values via
eigen(A)
. However, it's simpler to use the Cholesky decomposition. Note that when plotting confidence ellipses for data, the ellipse-axes are usually scaled to have length = square-root of the corresponding eigenvalues, and this is what the Cholesky decomposition gives.Edit: in order to plot the eigenvectors as well, you have to use the more complicated approach. This is equivalent to suncoolsu's answer, it just uses matrix notation to shorten the code.