Solved – Maximum likelihood of multivariate t-distributed variable with scaled covariance

maximum likelihoodmultivariate analysisr

I am trying to estimate the covariance of a iid multivariate t-distributed random variable, where I define the multivariate density as in the Statlect textbok, which is the same as the wikipedia page. The density in question is given by,
$$f_{X}\left(x\right)=\frac{\Gamma\left(\frac{v+p}{2}\right)}{\Gamma\left(\frac{v}{2}\right)\sqrt{v^{p}\pi^{p}\left|\det \Sigma\right|}}\left(1+\frac{1}{v}x^{\prime}\Sigma^{-1}x\right)^{-\frac{v+p}{2}}$$
where $x$ is a realization of the $p\times1$ random vector $X$ with covariance $V[X]=\frac{v}{v-2}\Sigma$, and $v$ is the degrees of freedom parameter.

I'm interested in estimating the covariance of $Y=\sqrt{\frac{v-2}{v}}X$ which, for $v>2$, has covariance
$$V[Y]=V\left[\sqrt{\frac{v-2}{v}}X\right]=\frac{v-2}{v}V[X]=\Sigma$$

From density tranformation it holds that the density of $Y$ is,
$$f_Y(y)=\frac{1}{|\sqrt{\frac{v-2}{v}}|}f_X\left( y /\sqrt{\frac{v-2}{v}} \right)$$
which i have calculated to,
$$f_{Y}\left(y\right)=\frac{\Gamma\left(\frac{v+p}{2}\right)}{\Gamma\left(\frac{v}{2}\right)\sqrt{(v-2)v^{p-1}\pi^{p}\left|\det\Sigma \right|}}\left(1+\frac{1}{v-2}y^{\prime}\Sigma^{-1}y\right)^{-\frac{v+p}{2}}$$
so my best guess at the log-likelihood is:
$$L(\theta)=-\frac{1}{2}\times\sum_{i=1}^N\left[-2\delta(v)+p\log(\pi)+\log(v-2)+(p-1)\log(v)+\log(\det(\Sigma))+(v+p)\log\left(1+\frac{1}{v-2}y_i^{\prime}\Sigma^{-1}y_i \right) \right]$$
where $\delta(v)=\log\frac{\Gamma((v+p)/2)}{\Gamma(v/2)}$.

Are the derivations of the log-likelihood correct? Is this the likelihood of a variable $Y$ with covarians $\Sigma$, and can i expect the mariginals to be standard univariate t-distributions with variances $\text{diag}(\Sigma)$?

I have implemented this likelihood in R and tried to estimate on simulated data, but when I for example simulate with v=5, I get an estimate of around 50 to 60, like in the example below.

I'm quite confident about both the log-likelihood, as well as the 'R'-code, but the results don't add up, so clearly one or the other (or both) is worng! Can you help me figure out what I misunderstand.

Thank you in advance!

The simulation in R as an illustration:

N <-10000
v <- 5
set.seed(123)
sigma <- matrix(c(1,0.5,
                  0.5,1),2,2,byrow=T)

U <- chol(sigma)
Z <- matrix(rt(N*2,v),N,2)
Y <- Z%*%U*sqrt((v-2)/v)

# Check covariance
var(Y)
#          [,1]      [,2]
#[1,] 0.9764366 0.4701139
#[2,] 0.4701139 0.9515428

# Defining the (negative) log-likelihood function
lik.t <- function(theta,data){
  # Number of obs
  N <- dim(data)[1]
  # Nummber of dimensions (set to 2)
  p <- 2

  # Empty vector for likelihood values
  lik <- rep(NA,N)

  # Lower Choleski parametrization
  L <- matrix(c(theta[1:2],0,theta[3]),2,2)
  sigma <- L%*%t(L)

  # Ensuring v>2
  v <- exp(theta[4])+2

  delta <- log(gamma((v+p)/2))-log(gamma(v/2))   
  for(i in 1:N){
    lik[i] <- 1/2*(-2*delta + p*log(pi) + log(v-2) + (p-1)*log(v) + log(det(sigma)) +
              (v+p)*log(1 + 1/(v-2)*t(data[i,])%*%solve(sigma)%*%data[i,]))
  }
  sum(lik)
}

# Colecting tru parameters as starting value for BFGS
foo <- chol(sigma)[upper.tri(sigma,T)]
theta0 <- c(foo,v)

# Miniminzing minus the log-likelihood
out <- optim(theta0,lik.t,data=Y,method="BFGS",control=list(trace=1,REPORT=1))

#Printing results
L.hat <- matrix(0,2,2)
L.hat[lower.tri(L.hat,T)] <- out$par[1:3]
sigma.hat <- L.hat%*%t(L.hat)
v.hat <- exp(out$par[4])+2

sigma.hat  # Estimated covaraince
#         [,1]      [,2]
#[1,] 0.9347044 0.4534481
#[2,] 0.4534481 0.9088294
v.hat      # Estimated degrees of freedom
#[1] 51.53331

Best Answer

The EM algorithm is typically used to find MLEs of the parameters from an iid multivariate t sample. Instead of writting a long answer as to how to implement the algorithm I refer you to McLachlan and Krishnan "The EM algorithm and Extensions", second edition. They show the MLE procedure for both known and unknown degrees of freedom, as well as ways to accelerate the algorithm. If you don't have access to the book you can also look at this paper

https://arxiv.org/pdf/1707.01130.pdf

It turns out that estimating the degrees of freedom using ML results in an unbounded score function. The above paper finds estimators that minimize the MLq, which results in a bounded score. They also have a great review of the EM algorithm for the usual ML procedure.

Once you have estimators for both the degrees of freedom and the scale parameter, finding an estimate of the covariance is straight forward.