Confidence Interval – Confidence Regions on Bivariate Normal Distributions Using MLE or S

confidence intervalhotelling-t2maximum likelihoodmultivariate analysisnormal distribution

Given a $5 \times 2$ dataset $\mathbf{X} =\left( \begin{array}{rr}-0.9&0.2\\2.4&0.7\\-1.4&1.0\\2.9&-0.5\\2.0&-1.0 \end{array} \right)$. Assume that $X\sim N_2(\mu, \Sigma)$.

Using $\hat{\mu}_{MLE}$ and $\hat{\Sigma}_{MLE}$, I need to calculate the eigenvalues and eigenvectors and draw an 80% probability ellipse. In the statistical books I've seen, it is usually stated that to draw such an ellipse, you use $\mathbf{S} = \frac{1}{n-1}\sum(\mathbf{x_j} – \mathbf{\bar{x}})(\mathbf{x_j} – \mathbf{\bar{x}})'$ and calculate the eigenvalues and eigenvectors for this matrix. Then you can draw the ellipse, using these eigenvectors and the calculated $\mathbf{\bar{x}}$.

My Question
Now I now that $\hat{\Sigma}_{MLE} = \frac{n-1}{n}\mathbf{S}$. So what do I need to do here? Is it correct to use $\hat{\Sigma}_{MLE}$ to calculate the eigenvalues and -vectors and use them to find the 80% confidence interval, or do you always need to use $\mathbf{S}$?

Best Answer

Assume first the the parameters $\boldsymbol\mu$ and $\boldsymbol\Sigma$ are known. Just as $\frac{x-\mu}\sigma$ is standard normal and $\frac{(x-\mu)^2}{\sigma^2}$ chi-square with 1 degree of freedom in the univariate case, the quadratic form $(\mathbf{x}-\boldsymbol\mu)^T\boldsymbol\Sigma^{-1}(\mathbf{x}-\boldsymbol\mu)$ is chi-square with $p$ degrees of freedom when $\mathbf{x}$ is multivariate normal. Hence, this pivot $$ (\mathbf{x}-\boldsymbol\mu)^T\boldsymbol\Sigma^{-1}(\mathbf{x}-\boldsymbol\mu)\le \chi_{p,\alpha}^2 \tag{1} $$ with probability $(1-\alpha)$. A probability region for $\mathbf{x}$ is found by inverting (1) with respect to $\mathbf{x}$. For points at the boundary of this set, ${\mathbf{L}^{-1}}(\mathbf{x}-\boldsymbol{\mu})$ lies on a circle with radius $\sqrt{\chi^2_{p,\alpha}}$ where $\mathbf L$ is the cholesky factor of $\boldsymbol\Sigma$ (or some other square root) such that $$ \mathbf{L}^{-1}(\mathbf{x}-\boldsymbol{\mu})=\sqrt{\chi^2_{p,\alpha}} \left[ \begin{matrix} \cos(\theta)\\ \sin(\theta) \end{matrix} \right]. $$ Hence, the boundary of the set (an ellipse) is described by the parametric curve $$ \mathbf{x}(\theta)= \boldsymbol{\mu} + \sqrt{\chi^2_{p,\alpha}}\mathbf{L} \left[ \begin{matrix} \cos(\theta)\\ \sin(\theta) \end{matrix} \right], $$ for $0<\theta <2\pi$.

If the parameters are unknown and we we use $\bar{\mathbf{x}}$ to estimate $\boldsymbol\mu$, $\mathbf{x}-\bar{\mathbf{x}} \sim N_p(0,(1+1/n))\boldsymbol{\Sigma})$. Hence, $(1+1/n)^{-1}(\mathbf{x}-\bar{\mathbf{x}})^T\boldsymbol\Sigma^{-1}(\mathbf{x}-\bar{\mathbf{x}})$ is chi-square with $p$ degrees of freedom. Substituting $\boldsymbol\Sigma$ by its estimate $\hat{\boldsymbol\Sigma}=\frac1{n-1}\mathbf{X}^T \mathbf{X}$ the resulting pivot is instead Hotelling $T$-squared distributed with $p$ and $n-p$ degrees of freedom (analogous to the $F_{1,n-1}$ distributed squared $t$-statistic in the univariate case) such that $$ \Big(1+\frac1n\Big)^{-1}(\mathbf{x}-\bar{\mathbf{x}})^T\hat{\boldsymbol\Sigma}^{-1}(\mathbf{x}-\bar{\mathbf{x}}) \le T^2_{p,n-p,\alpha} \tag{2} $$ with probability $(1-\alpha)$. Because the Hotelling $T$-squared is just a rescaled $F$-distribution, the above quantile equals $\frac{p(n-1)}{n-p}F_{p,n-p,\alpha}$.

Inverting (2) with respect to $\mathbf{x}$ leads to a prediction region with boundary described by the parametric curve $$ \mathbf{x}(\theta)= \bar{\mathbf x} + \sqrt{\Big(1+\frac1n\Big)\frac{p(n-1)}{n-p}F_{p,n-p,\alpha}}\hat{\mathbf{L}} \left[ \begin{matrix} \cos(\theta)\\ \sin(\theta) \end{matrix} \right] $$ where $\hat{\mathbf L}$ is the cholesky factor of the sample variance matrix $\hat{\boldsymbol\Sigma}$.

Code computing this for the data in the original question:

pred.int.mvnorm <- function(x, alpha=.05) {
  p <- ncol(x)
  n <- nrow(x)
  Sigmahat <- var(x)
  xbar <- apply(x,2,mean)
  xbar
  theta <- seq(0, 2*pi, length=100)
  polygon <- xbar + 
    sqrt(p*(n-1)/(n-p)*(1 + 1/n)*qf(alpha, p, n - p, lower.tail = FALSE))*
    t(chol(Sigmahat)) %*% 
    rbind(cos(theta), sin(theta))
  t(polygon)
}
x <- matrix(c(-0.9,2.4,-1.4,2.9,2.0,0.2,0.7,1.0,-0.5,-1.0),ncol=2)
plot(pred.int.mvnorm(x), type="l",xlab=expression(x[1]),ylab=expression(x[2]))
points(x)

More code testing the coverage

library(mvtnorm)
library(sp)
hits <- 0
for (i in 1:1e+5) {
  x <- rmvnorm(6, sigma = diag(2))
  pred.int <- pred.int.mvnorm(x[-1,])
  x <- x[1,]
  if (point.in.polygon(x[1], x[2], pred.int[,1], pred.int[,2])==1)
    hits <- hits + 1
}
hits
[1] 94955

Related Solutions

R Ellipse Plotting – How to Use Eigenvalues and Eigenvectors

You could extract the eigenvectors and -values via eigen(A). However, it's simpler to use the Cholesky decomposition. Note that when plotting confidence ellipses for data, the ellipse-axes are usually scaled to have length = square-root of the corresponding eigenvalues, and this is what the Cholesky decomposition gives.

ctr    <- c(0, 0)                               # data centroid -> colMeans(dataMatrix)
A      <- matrix(c(2.2, 0.4, 0.4, 2.8), nrow=2) # covariance matrix -> cov(dataMatrix)
RR     <- chol(A)                               # Cholesky decomposition
angles <- seq(0, 2*pi, length.out=200)          # angles for ellipse
ell    <- 1 * cbind(cos(angles), sin(angles)) %*% RR  # ellipse scaled with factor 1
ellCtr <- sweep(ell, 2, ctr, "+")               # center ellipse to the data centroid
plot(ellCtr, type="l", lwd=2, asp=1)            # plot ellipse
points(ctr[1], ctr[2], pch=4, lwd=2)            # plot data centroid

library(car)  # verify with car's ellipse() function
ellipse(c(0, 0), shape=A, radius=0.98, col="red", lty=2)

Edit: in order to plot the eigenvectors as well, you have to use the more complicated approach. This is equivalent to suncoolsu's answer, it just uses matrix notation to shorten the code.

eigVal  <- eigen(A)$values
eigVec  <- eigen(A)$vectors
eigScl  <- eigVec %*% diag(sqrt(eigVal))  # scale eigenvectors to length = square-root
xMat    <- rbind(ctr[1] + eigScl[1, ], ctr[1] - eigScl[1, ])
yMat    <- rbind(ctr[2] + eigScl[2, ], ctr[2] - eigScl[2, ])
ellBase <- cbind(sqrt(eigVal[1])*cos(angles), sqrt(eigVal[2])*sin(angles)) # normal ellipse
ellRot  <- eigVec %*% t(ellBase)                                          # rotated ellipse
plot((ellRot+ctr)[1, ], (ellRot+ctr)[2, ], asp=1, type="l", lwd=2)
matlines(xMat, yMat, lty=1, lwd=2, col="green")
points(ctr[1], ctr[2], pch=4, col="red", lwd=3)

enter image description here

Solved – Maximum likelihood of function of the mean on a restricted parameter space

So, I'll modify your problem slightly to avoid dealing with boundary issues. Instead of your constraint $\theta > 0$, I'll replace it with $\theta \geq 0$.

You want to maximize the likelihood subject to $\theta \geq 0$.

After taking the logarithm of your likelihood and ignoring constant terms, we get the problem:

$$ \min_{\theta} f(\theta) \text{ s.t. } \theta \geq 0$$

where

$$f(\theta) := \sum_{i=1}^n (X_i - \theta)^2.$$

You are correct that if we didn't have the constraint, we could simply differentiate the objective function and get $\theta^{unconstrained} := \bar{X}$.

However, due to the constraint, we can't just differentiate. So, let us consider the two cases separately:

If $\theta^{unconstrained}$ is positive, then it is also the solution for your constrained MLE problem (the additional constraint can only increase the value of the minimization problem above).
If $\theta^{unconstrainted} < 0$, then it doesn't satisfy your constraint and is not feasible. However, you can check for yourself (a bit of algebra) that $f(\theta) \geq f(0)$ for all $\theta \geq 0$ when $\theta^{constrainted} < 0$. Therefore, $\theta^0=0$ minimizes $f(\theta)$ over $\theta \geq 0$.

So for this problem, the MLE for $\theta$ is $\theta^{ML} = \max{(\bar{X}, 0)}$. And using the equivariance property, the MLE of $\sqrt{\theta}$ is $\sqrt{\theta^{ML}}$.

Best Answer

Related Solutions

R Ellipse Plotting – How to Use Eigenvalues and Eigenvectors

Solved – Maximum likelihood of function of the mean on a restricted parameter space

Related Question